7 No Multicollinearity

What this assumption means: Each predictor makes some unique contribution in explaining the outcome. A significant amount of the information contained in one predictor is not contained in the other predictors (i.e., non-redundancy).

Why it matters: Multicollinearity results in increased standard errors.

How to diagnose violations: A predictor’s variance inflation factor (VIF) should be below a cutoff, such as 5 or 10.

How to address it: Combine problematic variables into composite or factor scores, drop variables, or use structural equation modeling to account for shared variance.

7.1 Example Model

If you have not already done so, download the example dataset, read about its variables, and import the dataset into Stata.

Then, use the code below to fit this page’s example model.

use acs2019sample, clear
reg income hours_worked weeks_worked age

      Source |       SS           df       MS      Number of obs   =     2,761
-------------+----------------------------------   F(3, 2757)      =    165.57
       Model |  1.3405e+12         3  4.4684e+11   Prob > F        =    0.0000
    Residual |  7.4405e+12     2,757  2.6988e+09   R-squared       =    0.1527
-------------+----------------------------------   Adj R-squared   =    0.1517
       Total |  8.7810e+12     2,760  3.1815e+09   Root MSE        =     51950

------------------------------------------------------------------------------
      income | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
hours_worked |   1094.548   77.96601    14.04   0.000     941.6699    1247.425
weeks_worked |   379.7427   80.71919     4.70   0.000     221.4665    538.0189
         age |   731.5352   62.10243    11.78   0.000     609.7632    853.3072
       _cons |  -40999.16   4476.558    -9.16   0.000    -49776.91   -32221.41
------------------------------------------------------------------------------

7.2 Statistical Tests

Use the variance inflation factor (VIF) to detect multicollinearity. It tells us how much of one predictor’s variance is explained by the other predictors, and how much a coefficient’s standard error is increased due to multicollinearity. (Standard errors are increased by a factor of \(\sqrt{VIF}\).)

The formula for the VIF is \(\frac{1}{1-R^2}\), where \(R^2\) is obtained from a model where one predictor is regressed on all of the other predictors. Perfectly uncorrelated predictors have VIFs of 1, and perfectly correlated predictors have VIFs of infinity.

Different cutoffs are used for determining whether a VIF indicates multicollinearity, such as 5 (corresponding to \(R^2=0.8\)) or 10 (\(R^2=0.9\)).

7.2.1 Understanding the VIF

After fitting a model, run the estat vif command.

estat vif

    Variable |       VIF       1/VIF  
-------------+----------------------
weeks_worked |      1.22    0.822885
hours_worked |      1.20    0.831469
         age |      1.01    0.988249
-------------+----------------------
    Mean VIF |      1.14

This returns three VIFs, one for each predictor.

We will manually calculate one of the VIFs to enhance our understanding of them. We will use the VIF for hours_worked, 1.203, as an example.

Fit another model with just the predictors from our original model, where the predictor of interest (hours_worked) is used as the outcome. Our original formula was income ~ hours_worked + weeks_worked + age, so our new formula to get the VIF of hours_worked will be hours_worked ~ weeks_worked + age.

Then, find the unadjusted \(R^2\) in the model summary output.

reg hours_worked weeks_worked age

      Source |       SS           df       MS      Number of obs   =     2,761
-------------+----------------------------------   F(2, 2758)      =    279.51
       Model |  89988.4598         2  44994.2299   Prob > F        =    0.0000
    Residual |  443969.056     2,758  160.975002   R-squared       =    0.1685
-------------+----------------------------------   Adj R-squared   =    0.1679
       Total |  533957.516     2,760  193.462868   Root MSE        =    12.688

------------------------------------------------------------------------------
hours_worked | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
weeks_worked |   .4235731   .0179886    23.55   0.000     .3883006    .4588455
         age |  -.0063753   .0151667    -0.42   0.674    -.0361146    .0233641
       _cons |   18.98107   1.031837    18.40   0.000     16.95782    21.00432
------------------------------------------------------------------------------

The multiple \(R^2\) is 0.1685. The VIF for hours_worked is \(\frac{1}{1-R^2} = \frac{1}{1-0.1685}=1.203\), and this matches what we saw earlier.

7.2.2 Polynomials and Interactions

We should not be immediately concerned when we find high VIFs in a model with polynomials or interaction terms. For example, see this model, which has a polynomial term (hours_worked squared) and an interaction term (weeks_worked and age). These terms are highly correlated with the simple effects, so their \(R^2\) values are necessarily high.

In this case, we should fit another model without our polynomial and interaction terms, and check the VIFs again. This model is actually our original model where the highest VIF was about 1.2, so there is no evidence for multicollearity here. Note this is not always the case, however.

reg income hours_worked c.hours_worked#c.hours_worked c.weeks_worked##c.age
estat vif

      Source |       SS           df       MS      Number of obs   =     2,761
-------------+----------------------------------   F(5, 2755)      =     99.76
       Model |  1.3461e+12         5  2.6922e+11   Prob > F        =    0.0000
    Residual |  7.4349e+12     2,755  2.6987e+09   R-squared       =    0.1533
-------------+----------------------------------   Adj R-squared   =    0.1518
       Total |  8.7810e+12     2,760  3.1815e+09   Root MSE        =     51949

-----------------------------------------------------------------------------------------------
                       income | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
------------------------------+----------------------------------------------------------------
                 hours_worked |   1325.634   234.7095     5.65   0.000     865.4095    1785.858
                              |
c.hours_worked#c.hours_worked |  -2.997316   2.919011    -1.03   0.305    -8.720987    2.726356
                              |
                 weeks_worked |   518.4146   169.1275     3.07   0.002     186.7852    850.0441
                          age |   892.4567   161.6889     5.52   0.000     575.4129      1209.5
                              |
         c.weeks_worked#c.age |  -3.857646   3.638934    -1.06   0.289    -10.99296    3.277668
                              |
                        _cons |  -50439.61    8201.65    -6.15   0.000    -66521.62   -34357.61
-----------------------------------------------------------------------------------------------

    Variable |       VIF       1/VIF  
-------------+----------------------
hours_worked |     10.90    0.091745
          c. |
hours_worked#|
          c. |
hours_worked |     10.12    0.098829
weeks_worked |      5.34    0.187436
         age |      6.86    0.145784
          c. |
weeks_worked#|
       c.age |     12.10    0.082617
-------------+----------------------
    Mean VIF |      9.06

7.3 Corrective Actions

If we find evidence of multicollinearity, we have two basic approaches. We also need to check the other regression assumptions, since a violation of one can lead to a violation of another.

Combine predictors
- Create simple composite scores, such as sums or (un)weighted means.
- Fit a measurement model (where correlated predictors load on latent variables), extract latent variable estimates (factor scores), and use these in a regular regression model. Note this approach assumes zero measurement error.
- Fit a structural equation model that includes measurement models.
Drop predictors
- Simply remove predictors not essential to the research question until the VIFs of the focal predictors are decreased. Be aware that this may bias the other coefficient estimates. Note how may affect our conclusions for the other assumptions, especially linearity which assumes our model is complete.

After you have applied any corrections or changed your model in any way, you must re-check this assumption and all of the other assumptions.

Centering variables is often proposed as a remedy for multicollinearity, but it only helps in limited circumstances with polynomial or interaction terms.⁴ ⁵

Iacobucci, D., Schneider, M. J., Popovich, D. L., & Bakamitsos, G. A. (2016). Mean centering helps alleviate “micro” but not “macro” multicollinearity. Behavior Research Methods, 48, 1308–1317. https://doi.org/10.3758/s13428-015-0624-x ↩︎
Olvera Astivia, O. L., & Kroc, E. (2019). Centering in multiple regression does not always reduce multicollinearity: How to tell when your estimates will not benefit from centering. Educational and Psychological Measurement, 79(5), 813–826. https://doi.org/10.1177/0013164418817801 ↩︎