## Using Indicator Saturation to Detect Outliers and Structural Shifts

### Using Indicator Saturation to Detect Outliers and Structural Shifts

EViews 12 introduces a new technique to detect and model these outliers and structural changes through indicator saturation. in the recently released EViews 12, we thought we'd give another demonstration.

### Table of Contents

### Indicator Saturation

Identifying changes in data is essential if we are to properly estimate models based upon these data. One way to detect changes would be to include dummy or indicator variables for potential observations where the change occurs in your regression, and then decide whether that included indicator is a valid regressor. Such variables could include:

**Impulse Indicators**(IIS): a dummy variable equal to zero everywhere other than a single value of one at period tt. This indicator can be used to model single observation outliers, and is equivalent to the**@isperiod**EViews function used at the date corresponding to tt.**Step Indicators**(SIS): a step function variable equal to zero until tt and one thereafter. This indicator can be used to model a shift in the intercept of an equation, and is equivalent to the**@after**EViews function used at the date corresponding to tt.**Trend Indicators**(TIS): a trend-break variable that is equal to zero until period tt and then a follows a trend afterward. This indicator can be used to model a change in the trend of an equation (or the introduction of a trend term if one didn’t previously exist), and is equivalent to the**@trendbr**function used at the date corresponding to t.

The problem with the approach of including these variables in a traditional regression setting is that unless you know the specific dates where changes occur, you can quickly run into a situation where you have more variables than observations (since you’ll be adding at least one indicator variable for each observation in your estimation sample!).

Fortunately, recent advancements in variable selection techniques have meant that we can now perform variable selection on models with many more variables than observations, and so can saturate our regression with complex combinations of indicator variables and let the variable selection technique choose which are the most appropriate indicators to use.

### AutoSearch/GETS

One of the new technologies introduced in EViews 12 is the **AutoSearch/GETS** algorithm for variable selection.

AutoSearch/GETS is a method of variable selection that follows the steps suggested by AutoSEARCH algorithm of Escribano and Sucarrat (2011), which in turn builds upon the work in Hoover and Perez (1999), and is similar to the technology behind the **Autometrics™** module in **PcGive™**.

Mechanically the algorithm is similar to a backwards uni-directional stepwise method:

- The model with all search variables (termed the general unrestricted model, GUM) is estimated, and checked with a set of diagnostic tests.
- A number of search paths are defined, one for each insignificant search variable in the GUM.
- For each path, the insignificant variable defined in 2) is removed and then a series of further variable removal steps is taken, each time removing the most insignificant variable, and each time checking whether the current model passes the set of diagnostic tests. If the diagnostic tests fail after the removal of a variable, that variable is placed back into the model and prevented from being removed again along this path. Variable removal finishes once there are no more insignificant variables, or it is impossible to removal a variable without failing the diagnostic tests.
- Once all paths have been calculated the final models produced by the paths are compared using an information criteria selection. The best model is then selected.

One of the advantages of AutoSearch/GETS is that the set of candidate variables can be split into sets, with search performed on each sets one at a time, then the selected variables from each set can be combined into a final set to be searched. This allows you to test more candidate variables than you have observations without creating singularities (as long as enough candidate variables are rejected), which means it is a perfect algorithm for indicator saturation studies.

### An Application with Consumption and Income

To demonstrate this feature, we will estimate a simple personal consumption equation, using log-difference of personal consumption as the dependent variable against a constant and log-differenced disposable income. This estimation is purely for demonstration of the saturation features in EViews 12, and should not be taken as worthy macroeconomic research!

Both data series were downloaded directly from the Federal Reserve of St Louis database, FRED, and contain monthly observations between 2002 and April 2020:

**Quick/Estimate Equation**to bring up the equation estimation dialog.- Enter our dependent variable
**DLOG(CONS)**followed by a constant and our regressor**DLOG(INCOME)**. - Clicking OK.

If we click on the **Resids** button we can view a graph of the equation residuals.

Now we’ll estimate a new equation where we will instruct EViews to detect for both impulse (outlier) and step-shift (change in intercept) indicators, with the following steps:

**Quick/Estimate Equation**> to bring up the equation estimation dialog.- Enter our dependent variable
**DLOG(CONS)**followed by a constant and our regressor**DLOG(INCOME)**. - Switch to the
**Options Tab**and select**Auto-detection**under**Outliers/indicator saturation**. - Press the
**Options**button and select both**Impulse**and**Step-shift**indicators. - Change the
**Terminal condition p-value**to**0.01**(which will allow for more indicators entering the equation). - Clicking OK twice.

The impact of these variables on the log-differenced income coefficient is dramatic, as is resulting R-squared.

Viewing the residual graph shows that the large outliers have been removed, and the location of detected indicators, as shown by the vertical lines, corresponds to the outliers we eyeballed in the original equation.