International Teaching | ADVANCED STATISTICAL MODELLING FOR BIG DATA
International Teaching ADVANCED STATISTICAL MODELLING FOR BIG DATA
Back
cod. 0222400038
ADVANCED STATISTICAL MODELLING FOR BIG DATA
0222400038 | |
DEPARTMENT OF ECONOMICS AND STATISTICS | |
EQF7 | |
STATISTICAL SCIENCES FOR FINANCE | |
2023/2024 |
OBBLIGATORIO | |
YEAR OF COURSE 2 | |
YEAR OF DIDACTIC SYSTEM 2014 | |
AUTUMN SEMESTER |
SSD | CFU | HOURS | ACTIVITY | |
---|---|---|---|---|
SECS-S/01 | 10 | 60 | LESSONS |
Objectives | |
---|---|
KNOWLEDGE AND UNDERSTANDING SKILLS THE TEACHING AIMS TO PROVIDE THE FOLLOWING: THE KNOWLEDGE OF THE ANALYSIS OF ADVANCED STATISTICAL MODELS USEFUL FOR UNDERSTANDING PROBLEMS AND IMPROVING DECISION-MAKING PROCESSES; KNOWLEDGE OF ADVANCED STATISTICAL MODELS AND STATISTICAL LEARNING TOOLS USEFUL AS DECISION SUPPORT FOR PHENOMENA AND SYSTEMS IN WHICH LARGE AMOUNTS OF DATA, VARIABILITY AND UNCERTAINTY IMPLY A LEVEL OF COMPLEXITY THAT IS UNMANAGEABLE USING TRADITIONAL TECHNIQUES; ABILITY TO ANALYSE AND INTERPRET COMPLEX DATA AND PRODUCE PREDICTIVE AND ANALYTICAL MODELS TO SUPPORT COMPANY MANAGEMENT AND CONTROL POLICIES IN THE PUBLIC AND PRIVATE SECTORS. ABILITY TO APPLY KNOWLEDGE AND UNDERSTANDING ALL STATISTICAL MODELS WILL BE PRESENTED AS PREDICTIVE AND ANALYTICAL/INTERPRETATIVE TOOLS TO UNDERSTAND THE PROBLEMS IN A GENERAL DECISION-MAKING PROCESS. IN PARTICULAR, STUDENTS WILL DEVELOP THE ABILITY TO SPECIFY, ESTIMATE, AND VALIDATE A BROAD CLASS OF STATISTICAL MODELS WHEN APPLIED TO COMPLEX DATA STRUCTURES. A SPECIFIC FOCUS WILL BE GIVEN TO THE MODERN TOOLS AVAILABLE TO MANAGE AND ANALYSE BIG DATA AND STATISTICAL PROGRAMMING LANGUAGES TO DEVELOP AND IMPLEMENT EFFECTIVE ANALYTICAL SOLUTIONS. DIFFERENT CASE STUDIES WILL BE PRESENTED AND DISCUSSED TO BUILD STUDENTS' ABILITY TO LEVERAGE THEIR KNOWLEDGE TO ANALYSE REAL PROBLEMS AND DATASETS. |
Prerequisites | |
---|---|
KNOWLEDGE OF CALCULUS AND MATRIX CALCULUS, BASIC PROGRAMMING, STATISTICAL LANGUAGE R, PROBABILITY AND STATISTICAL INFERENCE IS REQUIRED. |
Contents | |
---|---|
REGRESSION MODELS, PREDICTIVE MODELS AND ANALYTICAL MODELS. PROBABILITY MODELS FOR NON-GAUSSIAN DATA. THE EXPONENTIAL FAMILY. GENERALIZED LINEAR MODELS (GLM). MODELS FOR GAUSSIAN DATA. MODELS FOR NON-GAUSSIAN CONTINUOUS DATA. MODELS FOR BINARY DATA. MODELS FOR COUNTING DATA. TWO-PART MODELS. LINEAR AND GLM MODELS FOR BIG DATA. ESTIMATES OF MANY MODELS ON DIFFUSED DATASETS. ESTIMATE IN THE PRESENCE OF HIGH DIMENSIONALITY. PENALTY ESTIMATES FOR GLM MODELS: RIDGE AND LASSO. GENERALIZATION OF THE LASSO. ELASTIC NET. THE GROUP LASSO. THE FUSED LASSO. ESTIMATION OF STATISTICAL MODELS IN SPARK. LINEAR AND GLM MODELS FOR BIG DATA IN R. PENALTY ESTIMATES IN R. CASE STUDIES AND APPLICATIONS TO NOTABLE PROBLEMS. FOR THE STUDENTS OF DATA SCIENCE AND INNOVATION MANAGEMENT, THERE WILL BE AN ADDITIONAL LESSON (3 HOURS) TO PRESENT AND DISCUSS APPLICATIONS OF DATA SCIENCE TO MANAGEMENT PROBLEMS. |
Teaching Methods | |
---|---|
THE COURSE INCLUDES 60 HOURS OF CLASSROOM TEACHING. INSTEAD, THE COURSE INCLUDES 63 CLASSROOM TEACHING FOR THE STUDENTS OF DATA SCIENCE AND INNOVATION MANAGEMENT. ALTHOUGH NOT MANDATORY, GIVEN THE COURSE'S NATURE, ATTENDANCE IS STRONGLY RECOMMENDED. DURING THE LESSONS, THEORETICAL ISSUES WILL BE ADDRESSED, CONSTANTLY SUPPORTED BY THE PRESENTATION OF CASE STUDIES THROUGH WHICH THE METHODS OF IMPLEMENTATION OF THE TECHNIQUES, THE CONTEXTS OF USE OF THE VARIOUS TOOLS AND THE POSSIBLE INTERPRETATIONS OF THE RESULTS OBTAINED WILL BE CLARIFIED. THE EXERCISES WILL THEREFORE FORM AN INTEGRAL PART OF THE SCHEDULED LESSONS. |
Verification of learning | |
---|---|
THE STUDENT WILL BE ASSESSED DURING THE FINAL TEST, WHICH WILL BE HELD IN CORRESPONDENCE TO THE DATES OF THE EXAMS CALENDARIZED BY THE DEPARTMENT. DURING THE FINAL EXAMINATION, THE STUDENT WILL DISCUSS A PROJECT WORK AND TAKE AN ORAL EXAMINATION. THE PROJECT MUST BE AGREED UPON WITH THE TEACHER DURING THE COURSE, FOLLOWING DETAILED GUIDELINES, WHICH WILL BE PROVIDED AT THE BEGINNING OF THE LECTURE CYCLE OR UPON THE REQUEST OF THE STUDENTS. |
Texts | |
---|---|
LECTURE NOTES, WEB SITES AND SUGGESTED PAPERS WILL BE MADE AVAILABLE BY THE INSTRUCTOR DURING SCHEDULED CLASSES - GENERALIZED LINEAR MODELS FOR INSURANCE DATA, PIET DE JONG GILLIAN HELLER, CAMBRIDGE UNIVERSITY PRESS - MASTERING SPARK WITH R, BY JAVIER LURASCHI, KEVIN KUO, EDGAR RUIZ, O'REILLY |
More Information | |
---|---|
THE INSTRUCTOR PROVIDES FURTHER EXPLANATIONS AND METHODOLOGICAL SUPPORT TO STUDENTS DURING OFFICE HOURS. DAYS, TIMES AND PLACE OF THE OFFICE HOURS,, AS WELL AS ANY CHANGES, ARE COMMUNICATED ON THE INSTRUCTOR’S WEB PAGE. IT IS POSSIBLE TO ARRANGE AN APPOINTMENT OUTSIDE THE SCHEDULED RECEPTION HOURS BY SENDING AN EMAIL TO THE TEACHER'S EMAIL ADDRESS. |
BETA VERSION Data source ESSE3