INFORMATION SYSTEMS FOR BIG DATA

International Teaching INFORMATION SYSTEMS FOR BIG DATA

Back

0222700009
DEPARTMENT OF MANAGEMENT & INNOVATION SYSTEMS
EQF7
DATA SCIENCE AND INNOVATION MANAGEMENT
2022/2023

YEAR OF COURSE 2
YEAR OF DIDACTIC SYSTEM 2020
AUTUMN SEMESTER
CFUHOURSACTIVITY
321LESSONS
321LAB
Objectives
THE COURSE AIMS TO INTRODUCE FUNDAMENTAL CONCEPTS, REQUIREMENTS, TECHNOLOGIES, AND REFERENCE ARCHITECTURES FOR DEFINING AND IMPLEMENTING BIG DATA-ORIENTED INFORMATION SYSTEMS.
SKILLS WILL BE LEARNED BY STUDYING EXISTING TECHNOLOGICAL FRAMEWORKS FOR ACQUISITION; STORAGE THROUGH NOSQL-DB (SOLR, MONGODB, NEO4J, ETC.) AND FORMATS FOR BIG DATA FILES (AVRO, PARQUET, ETC.); AND DISTRIBUTED PROCESSING, BOTH IN BATCH AND STREAM MODE (HADOOP, SPARK, ETC.), WITH THE AIMING OF CALCULATING ANALYTICS FROM UNSTRUCTURED OR SEMI-STRUCTURED RESOURCES, IN A SCALABLE MANNER.
IT WILL BE PROVIDED AN INTRODUCTION TO WEB APPLICATIONS FOR ANALYTICS VISUALIZATION, INCLUDING D3.JS AND TECHNOLOGY STACKS SUCH AS APACHE SOLR+BANANA AND ELASTICSEARCH+KIBANA.
AT THE END OF THE COURSE, THE STUDENT WILL BE ABLE TO USE THE MAIN TECHNOLOGICAL TOOLS FOR ACQUIRING, STORING, PROCESSING, AND ANALYZING BIG DATA. FURTHERMORE, THE STUDENT WILL BE ENCOURAGED TO CARRY OUT GROUP WORK AND APPLY THE ACQUIRED KNOWLEDGE TO IMPLEMENT A PROJECT EXHIBITING BIG DATA ANALYTICS FUNCTIONALITIES IN A CHOSEN FIELD (E.G., SOCIAL MEDIA, WEB INTELLIGENCE, SMART ENVIRONMENT, ETC.). THE OBJECTIVE CONSISTS IN EXERCISING THE ABILITY TO SELECT AND ADOPT SUITABLE TECHNOLOGIES DEPENDING ON HETEROGENEOUS REQUIREMENTS COMING FROM THE PROJECT CONTEXT.
Prerequisites
IT IS DESIRABLE THAT STUDENTS KNOW: THE BASIC CONCEPTS OF ALGORITHMS AND DATA STRUCTURES; AT LEAST A PROGRAMMING LANGUAGE AMONG JAVA, PYTHON, SCALA, TO WRITE SIMPLE PROGRAMS; THE BASICS OF DATABASES AND SQL.
Contents
AFTER A BRIEF INTRODUCTION TO THE MAIN LEARNING OBJECTIVES OF THE COURSE, STUDENTS WILL BE INTRODUCED TO THE BIG DATA WORLD.
IN THE EARLY PART OF THE COURSE, THE STUDENTS WILL BE ENCOURAGED TO WORK IN TEAM DEFINING A PROJECT WORK IN WHICH APPLY THE KNOWLEDGE ACQUIRED DURING THE CLASSES FOLLOWING A STEP-BY-STEP APPROACH.
THE COURSE WILL BE COMPOSED OF THE FOLLOWING MAIN PARTS.

(4 HOURS) INTRODUCTION TO BIGDATA-ENABLED ARCHITECTURES
BIGDATA LANDSCAPE
REQUIREMENTS OF BIGDATA INFORMATION SYSTEM
LAMBDA AND KAPPA ARCHITECTURE


(4 HOURS, ONE OF WHICH ARE LABORATORY ACTIVITIES) ACQUISITION
SERIALIZATION AND EXCHANGE DATA FORMATS: JSON, AVRO, PARQUET, ETC.
REST AND STREAM API FOR ACCESSING TWITTER, DROPBOX, ETC.

(10 HOURS, SEVEN OF WHICH ARE LABORATORY ACTIVITIES) DISTRIBUTED PROCESSING
HADOOP AND RELATED TECHNOLOGIES.
SPARK, AND OTHER BIG DATA PROCESSING ENGINES.
HANDS ON SPARK DATAFRAME
HANDS ON SPARK MACHINE LEARNING

(10 HOURS, SEVEN OF WHICH ARE LABORATORY ACTIVITIES) STORAGE
INTRODUCTION TO NOSQL DATABASE, SUCH AS KEY-VALUE STORE, DOCUMENT-ORIENTED DATABASE, COLUMN-ORIENTED AND GRAPH DB.
HANDS ON MONGODB
HANDS ON NEO4J

(10 HOURS, FOUR OF WHICH ARE LABORATORY ACTIVITIES) DISTRIBUTED STREAM PROCESSING
INTRODUCTION TO DISTRIBUTED DATA STREAM STREAM PROCESSING.
APACHE STORM, SPARK STREAMING, KAFKA STREAMS
HANDS ON SPARK STREAMING
HANDS ON KAFKA STREAMS

(4 HOURS, TWO OF WHICH ARE LABORATORY ACTIVITIES) BIG DATA ANALYTICS
INTRODUCTION TO ANALYTICS VISUALIZATION THROUGH A WEB APPLICATION CONSIDERING D3.JS AND THE MOST USED TECHNOLOGICAL STACKS: APACHE SOLR AND BANANA, ELASTICSEARCH AND KIBANA
HANDS ON APACHE SOLR AND BANANA
Teaching Methods
THE COURSE AIMS TO ENCOURAGE STUDENTS TO THE LIFELONG LEARNING PROCESS, WHICH INVOLVES THE CONTINUOUS UPDATING (THROUGHOUT LIFE) OF KNOWLEDGE AND SKILLS, TRYING TO STIMULATE CURIOSITY AND INTEREST IN INFORMATION TECHNOLOGY AND NEW TECHNOLOGIES ATTAINING WITH THE MATTER OF THE COURSE.
IN ORDER TO GET THEM USED TO SELF-LEARNING, STUDENTS WILL BE INVITED TO DEEPEN THE TOPICS OF THE COURSE BY OFFERING THEM ACCESS TO ONLINE RESOURCES OF PARTICULAR INTEREST.
DURING THE COURSE THE TEACHER WILL MAKE AMPLE USE OF EXAMPLES, GUIDED EXERCISES.
FROM A STRUCTURAL POINT OF VIEW, THE LESSONS WILL CONSIST OF
(21 HOURS) FRONTAL LESSONS.
(21 HOURS) LABORATORY ACTIVITIES.
Verification of learning
THE ACHIEVEMENT OF THE TEACHING OBJECTIVES IS CERTIFIED BY PASSING AN EXAM WHOSE EVALUATION IS IN THIRTIETHS.
THE EXAM IS DIVIDED INTO TWO PARTS: A "THEORETICAL" AND A "PRACTICE" TEST. IN ORDER TO PASS THE WHOLE EXAM, EACH PART MUST BE PASSED WITH, AT LEAST, A SUFFICIENT EVALUATION. OTHERWISE, THE EXAM IS CONSIDERED NOT PASSED. THE FINAL VOTE (IF THE SUFFICIENCY IS REACHED FOR EACH PART) IS GIVEN BY THE SUM OF THE VOTES OF THE TWO PARTIES.
FIRST PART: THE "THEORETICAL" ASSESSMENT CONSISTS OF A STUDENT'S PRESENTATION ABOUT A TOPIC OF INTEREST (PERTINENT TO THE COURSE) FROM THE TECHNOLOGICAL, METHODOLOGICAL AND/OR APPLICATIVE POINT OF VIEW (THROUGH A RESEARCH CARRIED OUT INDIVIDUALLY AND CRITICALLY WITH APPROPRIATE CONNECTIONS AND PARALLELISMS WITH THE THEMES STUDIED DURING THE COURSE);
SECOND PART: THE “PRACTICAL” ASSESSMENT REGARDS A PROJECT CARRIED OUT BY THE STUDENT WITHIN A TEAM, WHICH AIMS TO USE SOME TECHNOLOGIES STUDIED DURING THE COURSE AND/OR THOSE THAT EMERGED BY THE INDIVIDUAL RESEARCH.
Texts
MARZ, N., & WARREN, J. (2015). BIG DATA: PRINCIPLES AND BEST PRACTICES OF SCALABLE REAL-TIME DATA SYSTEMS. NEW YORK; MANNING PUBLICATIONS CO.

SUGGESTED READINGS:

BAHGA, ARSHDEEP, AND VIJAY MADISETTI. BIG DATA SCIENCE & ANALYTICS: A HANDS-ON APPROACH. VPT, 2016.

More Information
LINKS TO ADDITIONAL MATERIAL AND TEACHING MATERIALS WILL BE PROVIDED.
  BETA VERSION Data source ESSE3