If you would like to join the more introductory course of machine learning with Stata, click here.
No prior knowledge of machine learning techniques are required to attend this course, as the first session will start from scratch with a fresh introduction to the subject. This course will focus on three specific techniques not covered in the first-part of the course, that is: regression and classification trees (including bagging, random forests, and boosting), kernel-based regression, and global methods (step-wise, polynomial, spline, and series regressions).
The teaching approach will be mainly based on the graphical language and intuition more so than on algebra. The training will make use of instructional as well as real-world examples, and will evenly balance theory and practical sessions.
After the course, participants are expected to have an improved understanding of Stata's potential to perform some of the most used machine learning techniques, thus becoming able to master research tasks including:
- (i) factor-importance detection,
- (ii) signal-from-noise extraction,
- (iii) model-free regression and classification, both from a data-mining and a causal perspective.
The course is open to people coming from all scientific fields, but it is particularly targeted to researchers working in the medical, epidemiological and socio-economic sciences.
||Q&A with Instructor
Course Agenda: Advanced course
Session 1 (10:00-12:00 London time): The basics of Machine Learning
Machine Learning: definition, rational, usefulness
Supervised vs. unsupervised learning
Regression vs. classification problems
Inference vs. prediction
Sampling vs. specification error
Coping with the fundamental non-identifiability of E(y|x)
Parametric vs. non-parametric models
The trade-off between prediction accuracy and model interpretability
Measuring the quality of fit: in-sample vs. out-of-sample prediction power
The bias-variance trade-off and the Mean Square Error (MSE) minimization
Training vs. test mean square error
The information criteria approach
Estimating training and test error
Validation set, K-fold cross-validation, and the Bootstrap
Session 2 (14:00-16:00 London time)
Kernel-based and Nearest-neighbour methods
- Beyond parametric models: an overview
- Local, semi-global, and global approaches
- Kernel-based regression
- Nearest-neighbour regression
- Nearest-neighbour classification
Session 3: Semi-global and global approaches (10:00 - 12:00 London time)
- Constant step-function
- Piece wise polynomials
- Spline regression
- Polynomial and series estimators
- Partially linear models
- Generalised additive models
- Stata implementation
Session 4: Tree-based methods (14:00 - 16:00 London time)
Regression and classification trees: an introduction
- Growing a tree via recursive binary splitting
- Optimal tree pruning via cross-validation
Tree-based ensemble methods
- Random forests
Session 3 - 1 hour: Q&A with the instructor
Knowledge of basic statistics, Stata and econometrics is required, including:
- The notion of conditional expectation and related properties;
- point and interval estimation;
- regression model and related properties;
- probit and logit regression.
- The Elements of Statistical Learning: Data Mining, Inference, and Prediction, Hastie, T., Tibshirani, R., Friedman, J., Springer (2009)
- An Introduction to Statistical Learning, Gareth, J., Witten, D., Hastie, T., Tibshirani, R., Springer (2013)
- Microeconometrics Using Stata, Cameron e Trivedi, Revised Edition, StataPress (2010)
- A Super-Learning Machine for Predicting Economic Outcomes, Giovanni Cerulli
Terms & Conditions
- Student registrations: Attendees must provide proof of full time student status at the time of booking to qualify for student registration rate (valid student ID card or authorised letter of enrolment).
- Additional discounts are available for multiple registrations. Contact us for more information.
- Temporary, time limited licences for the software(s) used in the course will be provided. You are required to install the software provided prior to the start of the course.
- Full payment of course fees is required prior to the course start date to guarantee your place.
- Registration closes 1 calendar day prior to the start of the course.
Cancellations or changes to your registration
- 100% fee returned for cancellations made over 28-calendar days prior to start of the course.
- 50% fee returned for cancellations made 14-calendar days prior to the start of the course.
- No fee returned for cancellations made less than 14-calendar days prior to the start of the course.