Download the PDF file of the course outline

This course, presented in a two part sequence, will review the application of machine learning techniques to both prediction problems and so-called causal problems where a firm or policy maker needs to understand the impact of some form of intervention on a heterogeneous population.

One example, is a firm that wishes to understand how the introduction of a change in pricing impacts both aggregate demand, and the demand on different segments of the population. In another example, a policymaker seeks to understand the impact of an intervention both in terms of some form of average effect, but also how individuals differ in the magnitude of the effect. Examples include the impact of job training programmes, the impact of education policies in developing economies, and the differential impact of drugs on survival and recovery.

In this context we make the distinction between the ex post assessment of a change and the ex ante identification of characteristics of individuals that are predictive of the likely impact of such a change.

Using Breiman’s (2001) notion of two cultures in the use of statistical modelling, the course begins with a review of the fundamental differences between machine learning and econometrics.

There are two cultures in the use of statistical modeling to reach conclusions from data. One assumes that the data are generated by a given stochastic data model. The other uses algorithmic models and treats the data mechanism as unknown.

We contrast a modelling approach where the analyst makes certain assumption on model specification, including functional form, with an approach where the data mechanism is presumed unknown. In this context we consider the econometrician’s concern for internal validity, alongside the focus within machine learning of ensuring that a model is robust in the sense of generalising to unseen data (external validity).

The course will focus upon topics at the intersection of machine learning and econometrics, covering a mix of theory and applications. In making the distinction between models which are used to solve a prediction problem and models which are used to estimate some form of causal effect, we introduce participants to identification strategies in econometrics. Here it is important to demonstrate how empirical strategies such as unconfoundedness, instrumental variables, and difference-in-difference can be used alongside machine learning methods for prediction.

As a point of departure we make reference to the two broad types of machine learning in terms of supervised and unsupervised learning, making the link to nonparametric regression. We then consider a number of fundamental building blocks, starting with error decomposition in terms of bias and variance, the role of training, estimation and test samples, and the role of regularization as a means to avoid overfitting.

In covering the three broad areas where machine learning is used, namely prediction, classification and causal effects, for each case we link the exposition to a parametric benchmark. For prediction we consider the piecewise nonlinear regression model, for classification we review the fundamentals of parametric binary choice models, and for causal effects we consider the specification of models with instrumental variables and treatment effects.

Participants will also be introduced to the use of ensemble methods as an averaging and regularization device. In this context we will explore a number of general methods for model averaging including bootstrap sampling (so-called bagging) and random forests. For Machine Learning models in prediction, classification and causal effects we provide examples using Stata and Python.

The introduction of time-of-use electricity prices is an example of a policy with heterogeneous effects. Consumers in different socioeconomic groups and with distinct historical intra-day load profiles and behavioural characteristics, may respond differently to the introduction of tariffs that charge different prices for electricity at different times of the day. Customers who can (cannot) adapt their consumption profile to `TOU`

tariffs will accrue a benefit (cost). Those who consume electricity at more expensive peak periods, and who are unable to change their consumption patterns, could end up paying significantly more.

Analysts often describe subpopulations that are of interest a priori, and which can be defined by a known combination of covariates. However, increasingly researchers face a selection problem given a large number of possible covariates alongside uncertainty as to which covariates are important for heterogeneity, and what functional form best describes the association between these covariates and treatment effects.

In assessing whether demographic variables are informative in terms of the impact of `TOU`

tariffs on load profiles, the Customer-Led Network Revolution project noted

.. a relatively consistent average demand profile across the different demographic groups, with much higher variability within groups than between them. This high variability is seen both in total consumption and in peak demand.

In addition, the question of which demographic variables are important when consider- ing the impact of energy policies ignores the fact that many of these variables should be considered together, in a multiplicative fashion. One reason for this finding might be that, for example, it is the (unknown) combination of income, household size, education, and daily usage patterns that describes a particularly responsive or unresponsive group.

Throughout the course we make reference to the problem of identifying the distributional effects of some intervention, without succumbing to the problems of data mining (multiplicity). Here we examine the empirical problem of identifying the characteristics of winners and losers subsequent to the introduction of `TOU`

tariffs following the introduction of a Time-of-Use (`TOU`

) pricing scheme where the price per kWh of electricity usage depends on the time of consumption. The pricing scheme is enabled by smart meters, which records consumption every half-hour.

Using machine learning methods we describe the association between the effect of `TOU`

pricing schemes on household electricity demand and a range of variables that are observable before the introduction of the new pricing schemes.

- L. Breiman, J. Freidman, R. Olshen, C. Stone. Classification and Regression Trees. Klein-Verlag, 1990.
- J. Freidman, T. Hastie, R. Tibshirani. The Elements of Statistical Learning. Springer, 2009.

**Part 1 (2-days) - Machine Learning for Prediction Problems****Part 2 (2-days) - Machine Learning for Classification and Causal Effects****Recommended Reading List**

The course is designed to provide both the tools to undertake projects using machine learning (ml), and critically ensure that participants understand and can communicate how the methods work.

Towards this objective, on Day 1, Session 1 we introduce participants to the vernacular of machine learning tools.

In Session 2 will further explore the links between machine learning, econometrics and data mining. We also examine how ml utilise data mining tools, suitably adapted to allow inference. The course is designed in such as way to ensure that participants are given the necessary context to understand the genesis of ml methods.

To this end, the first point of departure reviews the ordinary least squares estimator and provides links to ml using kernel density estimation.

We also provide the necessary links to econometrics and nonparametric statistics.

- High-level overview of Machine Learning and ai
- Machine Learning: The Vernacular
- The Nature of Prediction Problems
- Prediction, Evaluation and Causal Inference

- Econometrics
- Machine Learning: Tools and Vernacular
- Bias Variance Tradeoff
- Regularisation
- Multiplicity and P-values
- Ensemble Learning
- Point of Departure I: The Ordinary Least Squares Estimator

Day 2, Session 1 begins with the second point of departure - high dimensional methods in statistics. These methods are used when analysts face a big data problem in terms of which of a large set of explanatory variables to include in a regression model.

We follow this with a practical where participants can explore the use of regularised regression tools with a number of empirical applications.

In session 2 we provide an introduction to a number of machine learning methods including regression trees and forests.

This is then followed by a practical where we examine the use of ml methods for prediction.

- High Dimensional Methods
- Least absolute Shrinkage and Selection (LASSO)
- Choosing λ
- Causal Inference in High-Dimensions
- LASSO For Treatment Models
- Double LASSO
- Practical: Regularized Regression

- Machine Learning and Decision Trees
- Machine Learning: Terminology and Concepts
- An Overview of Regression Trees
- The Bias-Variance Tradeoff
- Training, Testing and Cross Validation
- Regularization: Variance reduction and Ensemble Learning
- Practical: Machine Learning for Prediction

In Part 2, a separate 2-day follow-up course, we explore in more detail some of the concepts introduced in Part 1. In addition, we extend the coverage to include machine learning methods for both classification problems and for causal effects.

Regarding the latter, this material covers a relatively new and rapidly expanding field, where the potential of ml is applied to policy problems. A useful point of departure here is the work of this years Nobel Laureate Esther Duflo.

In Part 2, the second instalment of this two part course, we explore in more detail some of the concepts introduced in Part 2, and extend the coverage to include machine learning methods for both classification problems and for causal effects.

The material on causal effects takes as its point of departure the literature on treatment effects and examines the potential of ml to address a number of policy problems. A useful point of reference here is the work of this years Nobel Laureate, Esther Duflo.

Part II of this course is also constructed so that participants can take this module without needing to take Part I. That said, the overall learning experience is greater if participants take Part I and II as a sequence.

On Day 1, Session 1, we review some of the fundamentals of machine learning that were introduced during Part 1. This includes the use of ml for prediction, classification and causal effects, alongside the key methodological concepts such as the bias-variance tradeoff and methods to achieve regularisation.

In Session 2 we examine the use of ml tools applied to so-called classification problems. A useful frame of reference here is the decision to grant a loan or provide some form of service such as insurance. These models utilise characteristics of individuals/firms to understand the determinants of key outcomes such as loan default or an excess number of claims.

- Review of Part 1
- Bias-Variance trade-off, overfitting and prediction
- Prediction, Evaluation and Causal Inference

- Classification problems
- Parametric Benchmarks: binary choice models
- Application: Surviving the Titanic with Python integration
- Application: Credit Card default

Day 2, Session 1 begins with the third point of departure - programme evaluation and treatment effects. We will review the econometric methodology which includes methods to handle both endogeneity and move away from parametric functional forms. We make reference to the work of the Nobel Laureate Esther Duflo who has made significant contributions to the use of randomised control trials, in addition to the utilisation of machine learning methods in this context.

In Session 2 we examine the use of machine learning methods for causal inference. Relative to many of econometric methods, studies which employ ml techniques have sort to exploit so-called big data to provide a coherent approach to uncover variation in treatment effects without succumbing to the pitfalls of data mining.

This is followed by a practical where we examine the use of ml methods applied to the impact of time-of-use electricity on individual-level demand response. A key question here is whether it is possible to identify characteristics of households that enable policy makers to identify so-called winners and losers once we move to a price system where prices vary throughout the day.

- Overview
- Ignorability of Treatment
- Endogenous Selection
- Matching Estimators
- The Difference-in-Difference Estimator
- Application: Job Training Programs

- Causal Trees
- Honest Estimation
- Forests and Variance Reduction Methods
- Testing for Heterogeneity in Treatment Effects
- Application: Time of Use Tariffs and Smart Meter Data

See references here - Download the PDF file of the course outline (including suggested reading).

- It is required some knowledge of basic statistics and econometrics: notion of conditional expectation and related properties; point and interval estimation; regression model and related properties; probit and logit regression.
- Basic knowledge of Stata software