Training Calendar

An Introduction to Machine Learning using Stata - Co-Developed with Lancaster University

Online 2 days (11th June 2024 - 12th June 2024) Stata Intermediate, Introductory
Delivered by: Dr. Giovanni Cerulli, IRCrES-CNR
Automation, Big Data, Data Management, Programming, Statistics

Overview

Instructor: Dr Giovanni Cerulli

After the course, you can expect to achieve the following learning outcomes:

  • Comprehensive Understanding of Stata's Machine Learning Capabilities.
    • Participants will gain a deep understanding of how to leverage Stata's various machine learning packages. This includes proficiency in applying machine learning techniques for data analysis, with an emphasis on practical implementation.
  • Mastering Research Tasks.
    • Participants will be equipped with the skills to master essential research tasks, such as factor-importance detection, signal-from-noise extraction, correct model specification, and model-free classification. This proficiency extends to both data-mining and causal perspectives.
  • Graphical Language and Intuitive Approach.
    • The course emphasizes a graphical language and intuitive understanding over algebraic methods. Participants will develop the ability to visually interpret and communicate results, fostering a practical and applied approach to machine learning with Stata.

This course is a primer to any more advanced machine learning courses within Stata.

After attending this course you will receive a signed certificate of attendance as proof of professional development.

The course is delivered by Dr Giovanni Cerulli (Researcher at IRCrES-CNR, Research Institute on Sustainable Economic Growth, National Research Council of Italy) .

Meet Dr Giovanni Cerulli, giving an overview of the course.

Watch Dr Giovanni Cerulli's expertly instructed Machine Learning Regression guide now. In this video demonstration, Giovanni uses the command r_ml_Stata. Some of the model types you are able to create from this command include Elastic net, Regression tree, Neural network, Boosting, Support Vector Machine and Bagging and random forests.

Topic Background:

Recent years have witnessed an unprecedented availability of information on social, economic, and health-related phenomena. Researchers, practitioners, and policymakers now have access to huge datasets (so-called “Big Data”) on people, companies and institutions, web and mobile devices, satellites, etc., at increasing speed and detail.

Machine learning is a relatively new approach to data analytics, which places itself in the intersection between statistics, computer science, and artificial intelligence. Its primary objective is to turn information into knowledge and value by “letting the data speak”. Machine learning limits prior assumptions on data structure, and relies on a model-free philosophy supporting algorithm development, computational procedures, and graphical inspection more than tight assumptions, algebraic development and analytical solutions. Computationally unfeasible a few years ago, machine learning is a product of the computer’s era, of today machines’.

Today, various machine learning packages are available within Stata, but some of these are not known to all Stata users. This course fills this gap by making participants familiar with Stata's potential to draw knowledge and value from rows of large and possibly noisy data. The teaching approach will be based on graphical language and intuition more than on algebra. The sessions will make use of instructional as well as real-world examples and will balance theory and practical sessions evenly.


Real-world applications:

  • Informed Decision-Making in Various Domains: Participants will be empowered to apply machine learning techniques in diverse fields, such as social sciences, economics, and health. This knowledge will enable them to make informed decisions based on insights extracted from large datasets.
  • Enhanced Research Capabilities: Researchers can apply the learned techniques to enhance their research methodologies. The course's focus on correct model specification and model-free classification ensures robust analysis, contributing to the reliability of research findings.
  • Efficient Data Utilization: Professionals and policymakers will benefit from the ability to extract valuable information from large and possibly noisy datasets. This efficiency in data utilization can lead to improved policy formulation, strategic planning, and business decision-making.

Course Timetable

Morning SessionAfternoon SessionQ&A with Instructor
10am-12pm (London Time) 2pm-4pm (London Time) 4pm-4:30pm (London Time)

 


Course Agenda

Day 1:

Session 1 - The basics of Machine Learning

Session overview:

  • Machine Learning: definition, rational, usefulness
    • Supervised vs. unsupervised learning
    • Regression vs. classification problems
    • Inference vs. prediction 
    • Sampling vs. specification error
  • Coping with the fundamental non-identifiability of E(y|x) 
    • Parametric vs. non-parametric models
    • The trade-off between prediction accuracy and model interpretability
  • Goodness-of-fit measures
    • Measuring the quality of fit: in-sample vs. out-of-sample prediction power
    • The bias-variance trade-off and the Mean Square Error (MSE) minimization
    • Training vs. test mean square error
    • The information criteria approach
  • Estimating training and test error
    • Validation set, K-fold cross-validation, and the Bootstrap


Session 2 - Model selection as a correct specification procedure

Session overview:

  • Model selection as a correct specification procedure
  • The information criteria approach
  • Subset Selection
    • Best subset selection 
    • Backward stepwise selection 
    • Forward stepwise Selection
  • Shrinkage Methods 
    • Lasso and Ridge, and Elastic regression
    • Adaptive Lasso
    • Information criteria and cross validation for Lasso
  • Stata implementation

Day 2:

Session 1 - Discriminant analysis and nearest-neighbor classification

Session overview:

  • The classification setting
  • Bayes optimal classifier and decision boundary
  • Misclassification error rate
  • Discriminant analysis
    • Linear and quadratic discriminant analysis 
    • Naive Bayes classifier
  • The K-nearest neighbors classifier
  • Stata implementation

Session 2 - Neural networks

Session overview:

  • The neural network model
    • neurons, hidden layers, and multi-outcomes 
  • Training a neural networks 
    • Back-propagation via gradient descent 
    • Fitting with high dimensional data
    • Fitting remarks
  • Cross-validating neural network hyperparameters
  • Stata implementation

Pre-requisites

  • Knowledge of basic statistics, Stata and econometrics is required, including:

    • The notion of conditional expectation and related properties;

    • point and interval estimation;

    • regression model and related properties;

    • probit and logit regression.

Pre-course reading

  • Cerulli, G. (2023), “Fundamentals of Supervised Machine Learning: With Applications in Python, R, and Stata”, Springer.
  • Cerulli G. (2021). Improving econometric prediction by machine learning, Applied Economics Letters, 28,16, 1419-1425.
  • Hastie, T., Tibshirani, R., Friedman, J. (2009), “The Elements of Statistical Learning: Data Mining, Inference, and Prediction”, Springer. 
  • Gareth, J., Witten, D., Hastie, T., Tibshirani, R.  (2013), “An Introduction to Statistical Learning”, Springer.
  • Cameron A.C. and Trivedi P.K. (2010), “Microeconometrics Using Stata”,  StataPress.

Terms & Conditions

  • Student registrations: Attendees must provide proof of full time student status at the time of booking to qualify for student registration rate (valid student ID card or authorised letter of enrolment).
  • Additional discounts are available for multiple registrations.
  • Delegates are provided with temporary licences for the principal software package(s) used in the delivery of the course. It is essential that these temporary training licenses are installed on your computers prior to the start of the course.
  • Payment of course fees required prior to the course start date.
  • Registration closes 1 calendar day prior to the start of the course.
    • 100% fee returned for cancellations made more than 28-calendar days prior to start of the course.
    • 50% fee returned for cancellations made 14-calendar days prior to the start of the course.
    • No fee returned for cancellations made less than 14-calendar days prior to the start of the course.

The number of attendees is restricted. Please register early to guarantee your place.

  •  CommercialAcademicStudent
    11 - 12 June 2024 (11/06/2024 - 12/06/2024)

All prices exclude VAT or local taxes where applicable.

* Required Fields

£0
Post your comment

Timberlake Consultants