STATISTICAL DATA ANALYSIS

International Teaching STATISTICAL DATA ANALYSIS

0622900026
DIPARTIMENTO DI INGEGNERIA DELL'INFORMAZIONE ED ELETTRICA E MATEMATICA APPLICATA
EQF7
DIGITAL HEALTH AND BIOINFORMATIC ENGINEERING
2021/2022



YEAR OF COURSE 2
YEAR OF DIDACTIC SYSTEM 2018
SECONDO SEMESTRE
CFUHOURSACTIVITY
1STATISTICAL DATA ANALYSIS - MOD.2
324LESSONS
324EXERCISES
2STATISTICAL DATA ANALYSIS - MOD.1
216LESSONS
18EXERCISES
Objectives
THE COURSE HAS THE TWOFOLD PURPOSE OF: I) ILLUSTRATING THE MAIN METHODOLOGIES OF INTEREST FOR STATISTICAL DATA ANALYSIS; II) APPLYING SUCH METHODOLOGIES TO RELEVANT PRACTICAL PROBLEMS, USING TOOLS COMMONLY EMPLOYED FOR STATISTICAL ANALYSIS, DATA VISUALIZATION AND PROCESSING.


KNOWLEDGE AND UNDERSTANDING.
•ACQUISITION OF THE MAIN STATISTICAL INFERENCE AND DATA ANALYSIS.
•PARAMETRIC VS. NON PARAMETRIC APPROACHES. SUPERVISED VS. UNSUPERVISED APPROACHES.
•ACQUISITION OF THE MAIN TECHNIQUES AND TOOLS FOR BIG DATA ANALYSIS.

APPLICATION KNOWLEDGE AND UNDERSTANDING.
•ABILITY TO APPLY THE MAIN TECHNIQUES FOR STATISTICAL INFERENCE AND DATA ANALYSIS TO PRACTICAL PROBLEMS (E.G., SOCIAL OR BIOMEDICAL DATA).
•ABILITY TO EXAMINE BIG DATA, ARRANGED IN RATHER COMPLEX AND/OR HETEROGENEOUS STRUCTURES
• ABILITY TO USE SOFTWARE (E.G., R, PYTHON, MATLAB) FOR STATISTICAL DATA ANALYSIS, DATA VISUALIZATION AND PROCESSING.
•ABILITY TO USE TOOLS OF PRACTICAL INTEREST FOR DATA ANALYTICS (E.G., APACHE SPARK) .
Prerequisites
PREREQUISITES: SUITABLE KNOWLEDGE OF MATHEMATICS AND FUNDAMENTALS OF PROBABILITY AND STATISTICS.

Contents
- FUNDAMENTALS OF STATISTICS (HOURS FOR LECTURE/EXERCISES/LABORATORY: 7/2/1)
STATISTICAL INFERENCE, PARAMETRIC METHODS, MAXIMUM LIKELIHOOD. DECISION THEORY. BAYESIAN APPROACH.

- DATA NORMALIZATION. WHITENING (1/0/1)

- INTRODUCTION TO SUPERVISED LEARNING AND LINEAR MODELS (6/0/3)
MULTIPLE LINEAR REGRESSION. GENERALIZED LINEAR MODELS.

- CLASSIFICATION (10/3/3)
LOGISTIC REGRESSION. LINEAR DISCRIMINANT ANALYSIS. BAYESIAN FORMULATION OF REGRESSION/CLASSIFICATION. BIAS AND VARIANCE. NAÏVE-BAYES. NONPARAMETRIC SUPERVISED APPROACHES. EXAMPLES: NAÏVE-KERNEL, NEAREST-NEIGHBOR AND K-NEAREST-NEIGHBOR.

- RESAMPLING (2/0/1)
CROSS-VALIDATION (LOO, K-FOLD). BOOTSTRAP.

- LINEAR MODEL SELECTION AND REGULARIZATION (9/0/3)
STEPWISE SELECTION. RIDGE REGRESSION. LASSO. DIMENSIONALITY REDUCTION. PRINCIPAL COMPONENT REGRESSION. EXTENSION TO HIGH-DIMENSIONAL DATA. SPARSITY-AWARE METHODS FOR BIG DATA ANALYTICS.

- GENERALIZED ADDITIVE MODELS AND TREE-BASED METHODS (1/0/0)

- SUPPORT VECTOR MACHINES (1/0/0)

- UNSUPERVISED LEARNING (10/3/3)
PRINCIPAL COMPONENTS ANALYSIS. CENTROID-BASED CLUSTERING: K-MEANS. HIERARCHICAL CLUSTERING. OTHER EXAMPLES OF CLUSTERING. GAUSSIAN MIXTURES AND THE EXPECTATION-MAXIMIZATION ALGORITHM. DENSITY-BASED CLUSTERING: DBSCAN.

- NONPARAMETRIC STATISTICS AND INTRODUCTION TO FUNCTIONAL DATA ANALYSIS (1/0/1)

TOTAL LECTURE/PRACTICE/LABORATORY HOURS 48/8/16

- SOFTWARE AND TOOLS:
R
PYTHON
MATLAB
APACHE SPARK
Teaching Methods
THE COURSE INCLUDES THEORETICAL LECTURES AND CLASSROOM EXERCISES ALSO WITH THE USAGE OF COMPUTERS.
Verification of learning
THE FINAL EXAM CONSISTS OF DISCUSSING A PROJECT WORK, AIMED AT EVALUATING: THE KNOWLEDGE AND UNDERSTANDING OF THE CONCEPTS PRESENTED DURING THE COURSE; THE ABILITY OF SOLVING STATISTICAL-DATA-ANALYSIS PROBLEMS BY APPLYING THE METHODS AND TOOLS ILLUSTRATED DURING THE COURSE. FURTHERMORE, THE PERSONAL JUDGEMENT, THE COMMUNICATION SKILLS AND THE LEARNING ABILITIES ARE ALSO EVALUATED.
Texts
AN INTRODUCTION TO STATISTICAL LEARNING,
G. JAMES, D. WITTEN, T. HASTIE, R. TIBSHIRANI,
SPRINGER, 2013.

AN ELEMENTARY INTRODUCTION TO STATISTICAL LEARNING,
S. KULKARNI, G. HARMAN,
WILEY, 2010.

SUPPLEMENTARY TEACHING MATERIAL WILL BE AVAILABLE ON THE UNIVERSITY E-LEARNING PLATFORM (HTTP://ELEARNING.UNISA.IT) ACCESSIBLE TO STUDENTS USING THEIR OWN UNIVERSITY CREDENTIALS.
More Information
THE COURSE IS HELD IN ENGLISH.
  BETA VERSION Data source ESSE3