Latest News

GAM(L)A: An econometric model for interpretable Machine Learning

In recent years, machine learning (ML) algorithms like random forest and gradient boosting have received considerable attention in the literature and overshadowed traditional econometric models in most applications. Although econometrics and ML have developed in parallel, both approaches allow building predictive models. For that purpose, econometrics relies on probabilistic models describing economic phenomena, whereas ML builds upon smart algorithms learning on their own. However, ML algorithms have recently been shown to be more effective than traditional econometric approaches for modelling complex relationships. Indeed, unlike traditional econometric models, these algorithms are able to capture many complex non-linear relationships through non-parametric approaches, resulting in higher predictive performance. The dominance of ML models in terms of predictive performance, in addition to several other advantages, has led these techniques to be used in several industries.

However, ML algorithms raise a very important issue for the industry due to their lack of interpretability. Indeed, most of these algorithms are generally considered to be “black-boxes”, i.e., the opacity of ML techniques leads users to predictions and decision processes that cannot be easily interpreted. The lack of interpretability is currently one of the main limitations of ML algorithms and raises concerns in many applications such as medicine, law, military or finance. ML algorithms need to be interpretable to justify predictions made by the models. For example, in the financial industry, executives need to be able to understand the model to justify their decisions, and regulators require interpretability to ensure fairness of the algorithms. Furthermore, the lack of interpretability of ML algorithms is currently one of the major concerns of financial regulators regarding the governance of artificial intelligence approaches in the financial industry.

Within this context, Flachaire et al. (2022) rely on a class of interpretable models. Instead of developing methods to explain the predictions of black boxes, they design flexible models that are fundamentally interpretable. Denoted as GAM-lasso (GAMLA) and GAM-autometrics (GAMA), or GAM(L)A in short, these models combine the predictive performance of ML approaches with the inherent interpretability of econometric models. Formally, this class of models is based on a generalized additive model (GAM) augmented by variables assumed to have a linear effect on the dependent variable. More specifically they consider interactions of covariate couples. However, due to the possibly large number of interaction variables, they perform variable selection on interactions to avoid overfitting issues. For that purpose, they  rely on the lasso and autometrics approaches. Finally, as the models involve linear (interaction effects) and non-linear (smooth functions of GAM) terms, the variable selection is not performed on raw data but on filtered data using the double residual approach of Robinson (1988).

Their approach has several advantages. First, their model is fundamentally interpretable. Indeed, GAM(L)A inherits the simplicity of interpretation of traditional econometric models. Specifically, while smooth functions allow a simple interpretation of the estimated relationships prevailing between dependent and predictive variables, interaction effects can be interpreted as in a simple linear model because they are introduced linearly. Moreover, the effect of the predictive variables can easily be measured from their marginal effects, as in standard econometric models. GAM(L)A thus allows a simple interpretation of prediction and decision processes, unlike ML algorithms. This class of models is also consistent with the recent literature promoting inherently interpretable models instead of interpretable ML methods.

Importantly, as shown in their empirical applications, GAM(L)A competes with sophisticated ML algorithms in terms of predictive performance reinforcing the idea that parametric models can have outstanding forecasting performances if they are well specified. 

Authors: Emmanuel Flachaire (1), Sullivan Hué (1), Sébastien Laurent (2) and Gilles Hacheme (1)

  1. Aix-Marseille University (Aix-Marseille School of Economics), CNRS & EHESS, 
  2. Aix-Marseille University (Aix-Marseille School of Economics), CNRS & EHESS, Aix-Marseille Graduate School of Management – IAE, France.


Post your comment

Timberlake Consultants