ADVANCED STATISTICAL MODELLING FOR BIG DATA

International Teaching ADVANCED STATISTICAL MODELLING FOR BIG DATA

Back

0222400038
DEPARTMENT OF ECONOMICS AND STATISTICS
EQF7
STATISTICAL SCIENCES FOR FINANCE
2022/2023



OBBLIGATORIO
YEAR OF COURSE 2
YEAR OF DIDACTIC SYSTEM 2014
AUTUMN SEMESTER
CFUHOURSACTIVITY
1060LESSONS
Objectives
TO ACQUIRE (I) THE KNOWLEDGE OF THE ANALYSIS OF ADVANCED STATISTICAL MODELS USEFUL FOR UNDERSTANDING PROBLEMS AND BETTER MANAGEMENT OF DECISION-MAKING PROCESSES; (II) KNOWLEDGE OF ADVANCED STATISTICAL MODELS AND STATISTICAL LEARNING TOOLS USEFUL AS A SUPPORT TO DECISIONS RELATING TO PHENOMENA AND SYSTEMS IN WHICH LARGE QUANTITIES OF DATA, VARIABILITY AND UNCERTAINTY IMPLY A LEVEL OF COMPLEXITY UNMANAGED USING TRADITIONAL TECHNIQUES; (III) ABILITY TO ANALYZE AND INTERPRET COMPLEX DATA AND TO PRODUCE PREDICTIVE AND ANALYTICAL MODELS TO SUPPORT THE CONTROL AND MANAGEMENT POLICIES OF A COMPANY, BOTH IN THE PUBLIC AND PRIVATE SECTORS. ALL STATISTICAL MODELS WILL BE PRESENTED BOTH AS PREDICTIVE AND ANALYTICAL / INTERPRETATIVE TOOLS TO ACQUIRE A DEEP UNDERSTANDING OF THE PROBLEMS IN A GENERAL DECISION-MAKING PROCESS. IN PARTICULAR, STUDENTS WILL DEVELOP THE ABILITY TO SPECIFY, ESTIMATE AND VALIDATE A WIDE CLASS OF STATISTICAL MODELS WHEN APPLIED TO COMPLEX DATA STRUCTURES. A SPECIFIC FOCUS WILL BE GIVEN TO THE MODERN TOOLS AVAILABLE TO MANAGE AND ANALYZE THE BIG DATA AND STATISTICAL PROGRAMMING LANGUAGES AVAILABLE TO DEVELOP AND IMPLEMENT EFFECTIVE ANALYTICAL SOLUTIONS. DIFFERENT CASE STUDIES WILL BE PRESENTED AND DISCUSSED TO CREATE THE ABILITY OF STUDENTS TO USE THEIR KNOWLEDGE TO ANALYZE PROBLEMS AND REAL DATA SETS.
Prerequisites
KNOWLEDGE OF CALCULUS AND MATRIX CALCULUS, BASIC PROGRAMMING, STATISTICAL LANGUAGE R, PROBABILITY AND STATISTICAL INFERENCE IS REQUIRED.
Contents
A SINGLE MODULE OF 60 (LM STATISTICAL SCIENCES FOR FINANCE) AND 63 HOURS (LM DATA SCIENCE AND INNOVATION MANAGEMENT).

REGRESSION MODELS, PREDICTIVE MODELS AND ANALYTICAL MODELS. PROBABILITY MODELS FOR NON-GAUSSIAN DATA. THE EXPONENTIAL FAMILY. GENERALIZED LINEAR MODELS (GLM). MODELS FOR GAUSSIAN DATA. MODELS FOR NON-GAUSSIAN CONTINUOUS DATA. MODELS FOR BINARY DATA. MODELS FOR COUNTING DATA. TWO-PART MODELS. LINEAR AND GLM MODELS FOR BIG DATA. ESTIMATES OF MANY MODELS ON DIFFUSED DATASETS. ESTIMATE IN THE PRESENCE OF HIGH DIMESIONALITY. PENALTY ESTIMATES FOR GLM MODELS: RIDGE AND LASSO. GENERALIZATION OF THE LASSO. ELASTIC NET. THE GROUP LASSO. THE FUSED LASSO. ESTIMATION OF STATISTICAL MODELS IN SPARK. LINEAR AND GLM MODELS FOR BIG DATA IN R. PENALTY ESTIMATES IN R. CASE STUDIES AND APPLICATIONS TO NOTABLE PROBLEMS.
Teaching Methods
THE COURSE INCLUDES 60 (LM SCIENZE STATISTICHE PER LA FINANZA) E 63 ORE (LM DATA SCIENCE E GESTIONE DELL'INNOVAZIONE). HOURS OF CLASSROOM TEACHING. ALTHOUGH NOT MANDATORY, GIVEN THE NATURE OF THE COURSE, ATTENDANCE IS STRONGLY RECOMMENDED.
DURING THE LESSONS, THEORETICAL ISSUES WILL BE ADDRESSED, CONSTANTLY SUPPORTED BY THE PRESENTATION OF CASE STUDIES THROUGH WHICH THE METHODS OF IMPLEMENTATION OF THE TECHNIQUES, THE CONTEXTS OF USE OF THE VARIOUS TOOLS AND THE POSSIBLE INTERPRETATIONS OF THE RESULTS OBTAINED WILL BE CLARIFIED. THE EXERCISES WILL THEREFORE FORM AN INTEGRAL PART OF THE SCHEDULED LESSONS.
Verification of learning
THE STUDENT WILL BE ASSESSED DURING THE FINAL TEST TO BE HELD ON THE EXAM DATES SCHEDULED BY THE DEPARTMENT.
DURING THE FINAL TEST THE STUDENT WILL HAVE TO TAKE A WRITTEN TEST (ASSESSED IN THIRTIETHS) AND AN ORAL TEST WHICH WILL BE HELD, TYPICALLY, IN THE DAYS IMMEDIATELY FOLLOWING. THE DATE OF THE WRITTEN TEST IS THAT FORESEEN IN THE DEPARTMENT CALENDAR, THE DAY OF THE ORAL TEST IS AGREED WITH THE STUDENTS AT THE END OF THE WRITTEN TEST.
THE WRITTEN TEST (DURATION OF ABOUT 2 H) IS AIMED AT ASCERTAINING THE STUDENT'S ABILITY TO USE THE SOFTWARE TOOLS COVERED BY THE COURSE, THE STATISTICAL TECHNIQUES OF BOTH EXPLORATORY AND INFERENTIAL TYPES STUDIED, TO INTERPRET AND COMMENT ON THE STATISTICAL RESULTS OBTAINED. DURING THE WRITTEN TEST, THE STUDENT WILL RECEIVE AN EXAM TRACK AND WILL BE ASKED TO ANSWER 5 QUESTIONS (EACH WITH A MAXIMUM SCORE OF 6 POINTS) ON THE ENTIRE COURSE PROGRAM. THE ORAL TEST (LASTING ABOUT 30 MINUTES) CONSISTS OF AN INTERVIEW WITH QUESTIONS AND DISCUSSION OF THE WRITTEN PAPER. THE FINAL MARK (MIN 18, MAX 30 WITH POSSIBLE HONORS) IS ATTRIBUTED BY EVALUATING THE RESULTS OF THE WRITTEN AND ORAL TESTS IN WHICH THE MASTERY OF THE COURSE CONTENT, APPROPRIATENESS OF THE DEFINITIONS AND THEORETICAL REFERENCES, CLARITY OF THE ARGUMENT, DOMAIN OF SPECIALIZED LANGUAGE.
THE EXAM DOES NOT INCLUDE TESTS TAKEN.
Texts
LECTURE NOTES, WEB SITES AND SUGGESTED PAPERS WILL BE MADE AVAILABLE BY THE INSTRUCTOR DURING SCHEDULED CLASSES
- GENERALIZED LINEAR MODELS FOR INSURANCE DATA, PIET DE JONG GILLIAN HELLER, CAMBRIDGE UNIVERSITY PRESS
- MASTERING SPARK WITH R, BY JAVIER LURASCHI, KEVIN KUO, EDGAR RUIZ, O'REILLY
More Information
THE INSTRUCTOR PROVIDES FURTHER EXPLANATIONS AND METHODOLOGICAL SUPPORT TO STUDENTS DURING OFFICE HOURS.
DAYS, TIMES AND PLACE OF THE OFFICE HOURS,, AS WELL AS ANY CHANGES, ARE COMMUNICATED ON THE INSTRUCTOR’S WEB PAGE.
IT IS POSSIBLE TO ARRANGE AN APPOINTMENT OUTSIDE THE SCHEDULED RECEPTION HOURS BY SENDING AN EMAIL TO THE TEACHER'S EMAIL ADDRESS.
Lessons Timetable

  BETA VERSION Data source ESSE3