INFORMATION SYSTEMS FOR BIG DATA

International Teaching INFORMATION SYSTEMS FOR BIG DATA

0222700009
DEPARTMENT OF MANAGEMENT & INNOVATION SYSTEMS
EQF7
DATA SCIENCE AND INNOVATION MANAGEMENT
2021/2022

OBBLIGATORIO
YEAR OF COURSE 2
YEAR OF DIDACTIC SYSTEM 2020
PRIMO SEMESTRE
CFUHOURSACTIVITY
321LESSONS
321LAB
Objectives
KNOWLEDGE AND UNDERSTANDING:
THE COURSE INTRODUCES KEY CONCEPTS AND TECHNOLOGIES FOR BIGDATA-ENABLED ARCHITECTURES. THESE CONCEPTS WILL BE DEVELOPED MAINLY THROUGH THE STUDY OF SOME KEY TECHNOLOGICAL FRAMEWORKS. THE COURSE WILL INTRODUCE DISTRIBUTED PROCESSING OF BIGDATA IN BATCH OR STREAMING MODE (HADOOP, SPARK, ETC.); STORING OF UNSTRUCTURED OR SEMI-STRUCTURED DATA THROUGH NOSQL-DB (MONGODB, NEO4J, ETC.); SERIALIZATION AND EXCHANGE OF BIGDATA (JSON, ETC.). IN CONCLUSION, SOME INTRODUCTORY NOTES WILL BE PROVIDED FOR DEVELOPING WEB APPLICATIONS DISPLAYING ANALYTICS, I.E. DASHBOARDS, IN THIS SENSE WILL BE CONSIDERED D3.JS AND THE MOST USED TECHNOLOGICAL STACKS: APACHE SOLR AND BANANA, ELASTICSEARCH AND KIBANA.

ABILITY TO APPLY KNOWLEDGE AND UNDERSTANDING:
AT THE END OF THE COURSE, THE STUDENT WILL BE ABLE TO MASTER THE MAJOR TECHNOLOGICAL TOOLS FOR ACQUIRING, STORING, PROCESSING AND ANALYZING THE BIG DATA.
THE COURSE AIMS TO MAKE THE STUDENT ABLE TO WORK IN A GROUP AND TO DO SOME OF THE ACTIVITIES OF A PROJECT TO IMPLEMENT LARGE DATA ANALYTICS CAPABILITIES IN A CHOSEN AREA, SUCH AS SOCIAL MEDIA, CLOUD STORAGE, SMART ENVIRONMENT, ENTERPRISE DOCUMENT MANAGEMENT, ETC.

COMMUNICATION SKILLS:
COMMUNICATION SKILLS WILL BE DEVELOPED THROUGH THE ABILITY TO SHARE WITH THE OTHER MEMBERS (AND WITH THE TEACHER) THE MAIN RESEARCH FINDINGS, THE ARCHITECTURAL AND TECHNOLOGICAL CHOICES AND THE MEANING OF THE EXTRACTED ANALYTICS.
STUDENTS REACH THESE SKILLS PARTICIPATING IN THE PROJECT WORK IN WHICH THEY APPLY THE ACQUIRED KNOWLEDGE AND ORGANIZING THE ORAL PRESENTATION OF THEIR WORK.

MAKING JUDGMENTS:
STUDENTS ARE GUIDED TO LEARN CRITICALLY AND RESPONSIBLY ALL THAT IS EXPLAINED TO THEM IN THE CLASSROOM AND TO ENRICH THEIR JUDGMENT SKILLS THROUGH THE STUDY OF THE LEARNING MATERIAL INDICATED BY THE TEACHER. JUDICIAL AUTONOMY IS ALSO ACHIEVED THROUGH GROUP WORK AND DISCUSSIONS WITH OTHER MEMBERS OF THE PROJECT TEAM.
Prerequisites
IT IS DESIRABLE THAT STUDENTS KNOW: THE BASIC CONCEPTS OF ALGORITHMS AND DATA STRUCTURES; AT LEAST A PROGRAMMING LANGUAGE AMONG JAVA, PYTHON, SCALA, TO WRITE SIMPLE PROGRAMS; THE BASICS OF DATABASES AND SQL.
Contents
AFTER A BRIEF INTRODUCTION TO THE MAIN LEARNING OBJECTIVES OF THE COURSE, STUDENTS WILL BE INTRODUCED TO THE BIG DATA WORLD.
IN THE EARLY PART OF THE COURSE, THE STUDENTS WILL BE ENCOURAGED TO WORK IN TEAM DEFINING A PROJECT WORK IN WHICH APPLY THE KNOWLEDGE ACQUIRED DURING THE CLASSES FOLLOWING A STEP-BY-STEP APPROACH.
THE COURSE WILL BE COMPOSED OF THE FOLLOWING MAIN PARTS.

(4 HOURS) INTRODUCTION TO BIGDATA-ENABLED ARCHITECTURES
BIGDATA LANDSCAPE
REQUIREMENTS OF BIGDATA INFORMATION SYSTEM
LAMBDA AND KAPPA ARCHITECTURE


(4 HOURS, ONE OF WHICH ARE LABORATORY ACTIVITIES) ACQUISITION
SERIALIZATION AND EXCHANGE DATA FORMATS: JSON, AVRO, PARQUET, ETC.
REST AND STREAM API FOR ACCESSING TWITTER, DROPBOX, ETC.

(10 HOURS, SEVEN OF WHICH ARE LABORATORY ACTIVITIES) DISTRIBUTED PROCESSING
HADOOP AND RELATED TECHNOLOGIES.
SPARK, AND OTHER BIG DATA PROCESSING ENGINES.
HANDS ON SPARK DATAFRAME
HANDS ON SPARK MACHINE LEARNING

(10 HOURS, SEVEN OF WHICH ARE LABORATORY ACTIVITIES) STORAGE
INTRODUCTION TO NOSQL DATABASE, SUCH AS KEY-VALUE STORE, DOCUMENT-ORIENTED DATABASE, COLUMN-ORIENTED AND GRAPH DB.
HANDS ON MONGODB
HANDS ON NEO4J

(10 HOURS, FOUR OF WHICH ARE LABORATORY ACTIVITIES) DISTRIBUTED STREAM PROCESSING
INTRODUCTION TO DISTRIBUTED DATA STREAM STREAM PROCESSING.
APACHE STORM, SPARK STREAMING, KAFKA STREAMS
HANDS ON SPARK STREAMING
HANDS ON KAFKA STREAMS

(4 HOURS, TWO OF WHICH ARE LABORATORY ACTIVITIES) BIG DATA ANALYTICS
INTRODUCTION TO ANALYTICS VISUALIZATION THROUGH A WEB APPLICATION CONSIDERING D3.JS AND THE MOST USED TECHNOLOGICAL STACKS: APACHE SOLR AND BANANA, ELASTICSEARCH AND KIBANA
HANDS ON APACHE SOLR AND BANANA
Teaching Methods
THE COURSE AIMS TO ENCOURAGE STUDENTS TO THE LIFELONG LEARNING PROCESS, WHICH INVOLVES THE CONTINUOUS UPDATING (THROUGHOUT LIFE) OF KNOWLEDGE AND SKILLS, TRYING TO STIMULATE CURIOSITY AND INTEREST IN INFORMATION TECHNOLOGY AND NEW TECHNOLOGIES ATTAINING WITH THE MATTER OF THE COURSE.
IN ORDER TO GET THEM USED TO SELF-LEARNING, STUDENTS WILL BE INVITED TO DEEPEN THE TOPICS OF THE COURSE BY OFFERING THEM ACCESS TO ONLINE RESOURCES OF PARTICULAR INTEREST.
DURING THE COURSE THE TEACHER WILL MAKE AMPLE USE OF EXAMPLES, GUIDED EXERCISES.
FROM A STRUCTURAL POINT OF VIEW, THE LESSONS WILL CONSIST OF
(21 HOURS) FRONTAL LESSONS.
(21 HOURS) LABORATORY ACTIVITIES.
Verification of learning
THE ACHIEVEMENT OF THE TEACHING OBJECTIVES IS CERTIFIED BY PASSING AN EXAM WHOSE EVALUATION IS IN THIRTIETHS.
THE EXAM IS DIVIDED INTO TWO PARTS: A "THEORETICAL" AND A "PRACTICE" TEST. IN ORDER TO PASS THE WHOLE EXAM, EACH PART MUST BE PASSED WITH, AT LEAST, A SUFFICIENT EVALUATION. OTHERWISE, THE EXAM IS CONSIDERED NOT PASSED. THE FINAL VOTE (IF THE SUFFICIENCY IS REACHED FOR EACH PART) IS GIVEN BY THE SUM OF THE VOTES OF THE TWO PARTIES.
FIRST PART: THE "THEORETICAL" ASSESSMENT CONSISTS OF A STUDENT'S PRESENTATION ABOUT A TOPIC OF INTEREST (PERTINENT TO THE COURSE) FROM THE TECHNOLOGICAL, METHODOLOGICAL AND/OR APPLICATIVE POINT OF VIEW (THROUGH A RESEARCH CARRIED OUT INDIVIDUALLY AND CRITICALLY WITH APPROPRIATE CONNECTIONS AND PARALLELISMS WITH THE THEMES STUDIED DURING THE COURSE);
SECOND PART: THE “PRACTICAL” ASSESSMENT REGARDS A PROJECT CARRIED OUT BY THE STUDENT WITHIN A TEAM, WHICH AIMS TO USE SOME TECHNOLOGIES STUDIED DURING THE COURSE AND/OR THOSE THAT EMERGED BY THE INDIVIDUAL RESEARCH.
Texts
MARZ, N., & WARREN, J. (2015). BIG DATA: PRINCIPLES AND BEST PRACTICES OF SCALABLE REAL-TIME DATA SYSTEMS. NEW YORK; MANNING PUBLICATIONS CO.

SUGGESTED READINGS:

BAHGA, ARSHDEEP, AND VIJAY MADISETTI. BIG DATA SCIENCE & ANALYTICS: A HANDS-ON APPROACH. VPT, 2016.

More Information
LINKS TO ADDITIONAL MATERIAL AND TEACHING MATERIALS WILL BE PROVIDED.
  BETA VERSION Data source ESSE3