7 No Multicollinearity
What this assumption means: Each predictor makes some unique contribution in explaining the outcome. A significant amount of the information contained in one predictor is not contained in the other predictors (i.e., non-redundancy).
Why it matters: Multicollinearity results in increased standard errors.
How to diagnose violations: A predictor’s variance inflation factor (VIF) should be below a cutoff, such as 5 or 10.
How to address it: Combine problematic variables into composite or factor scores, drop variables, or use structural equation modeling to account for shared variance.
7.1 Example Model
If you have not already done so, download the example dataset, read about its variables, and import the dataset into Stata.
Then, use the code below to fit this page’s example model.
use acs2019sample, clear
reg income hours_worked weeks_worked age
Source | SS df MS Number of obs = 2,761
-------------+---------------------------------- F(3, 2757) = 165.57
Model | 1.3405e+12 3 4.4684e+11 Prob > F = 0.0000
Residual | 7.4405e+12 2,757 2.6988e+09 R-squared = 0.1527
-------------+---------------------------------- Adj R-squared = 0.1517
Total | 8.7810e+12 2,760 3.1815e+09 Root MSE = 51950
------------------------------------------------------------------------------
income | Coefficient Std. err. t P>|t| [95% conf. interval]
-------------+----------------------------------------------------------------
hours_worked | 1094.548 77.96601 14.04 0.000 941.6699 1247.425
weeks_worked | 379.7427 80.71919 4.70 0.000 221.4665 538.0189
age | 731.5352 62.10243 11.78 0.000 609.7632 853.3072
_cons | -40999.16 4476.558 -9.16 0.000 -49776.91 -32221.41
------------------------------------------------------------------------------
7.2 Statistical Tests
Use the variance inflation factor (VIF) to detect multicollinearity. It tells us how much of one predictor’s variance is explained by the other predictors, and how much a coefficient’s standard error is increased due to multicollinearity. (Standard errors are increased by a factor of \(\sqrt{VIF}\).)
The formula for the VIF is \(\frac{1}{1-R^2}\), where \(R^2\) is obtained from a model where one predictor is regressed on all of the other predictors. Perfectly uncorrelated predictors have VIFs of 1, and perfectly correlated predictors have VIFs of infinity.
Different cutoffs are used for determining whether a VIF indicates multicollinearity, such as 5 (corresponding to \(R^2=0.8\)) or 10 (\(R^2=0.9\)).
7.2.1 Understanding the VIF
After fitting a model, run the estat vif
command.
estat vif
Variable | VIF 1/VIF
-------------+----------------------
weeks_worked | 1.22 0.822885
hours_worked | 1.20 0.831469
age | 1.01 0.988249
-------------+----------------------
Mean VIF | 1.14
This returns three VIFs, one for each predictor.
We will manually calculate one of the VIFs to enhance our understanding of them. We will use the VIF for hours_worked
, 1.203, as an example.
Fit another model with just the predictors from our original model, where the predictor of interest (hours_worked
) is used as the outcome. Our original formula was income ~ hours_worked + weeks_worked + age
, so our new formula to get the VIF of hours_worked
will be hours_worked ~ weeks_worked + age
.
Then, find the unadjusted \(R^2\) in the model summary output.
reg hours_worked weeks_worked age
Source | SS df MS Number of obs = 2,761
-------------+---------------------------------- F(2, 2758) = 279.51
Model | 89988.4598 2 44994.2299 Prob > F = 0.0000
Residual | 443969.056 2,758 160.975002 R-squared = 0.1685
-------------+---------------------------------- Adj R-squared = 0.1679
Total | 533957.516 2,760 193.462868 Root MSE = 12.688
------------------------------------------------------------------------------
hours_worked | Coefficient Std. err. t P>|t| [95% conf. interval]
-------------+----------------------------------------------------------------
weeks_worked | .4235731 .0179886 23.55 0.000 .3883006 .4588455
age | -.0063753 .0151667 -0.42 0.674 -.0361146 .0233641
_cons | 18.98107 1.031837 18.40 0.000 16.95782 21.00432
------------------------------------------------------------------------------
The multiple \(R^2\) is 0.1685. The VIF for hours_worked
is \(\frac{1}{1-R^2} = \frac{1}{1-0.1685}=1.203\), and this matches what we saw earlier.
7.2.2 Polynomials and Interactions
We should not be immediately concerned when we find high VIFs in a model with polynomials or interaction terms. For example, see this model, which has a polynomial term (hours_worked
squared) and an interaction term (weeks_worked
and age
). These terms are highly correlated with the simple effects, so their \(R^2\) values are necessarily high.
In this case, we should fit another model without our polynomial and interaction terms, and check the VIFs again. This model is actually our original model where the highest VIF was about 1.2, so there is no evidence for multicollearity here. Note this is not always the case, however.
reg income hours_worked c.hours_worked#c.hours_worked c.weeks_worked##c.age
estat vif
Source | SS df MS Number of obs = 2,761
-------------+---------------------------------- F(5, 2755) = 99.76
Model | 1.3461e+12 5 2.6922e+11 Prob > F = 0.0000
Residual | 7.4349e+12 2,755 2.6987e+09 R-squared = 0.1533
-------------+---------------------------------- Adj R-squared = 0.1518
Total | 8.7810e+12 2,760 3.1815e+09 Root MSE = 51949
-----------------------------------------------------------------------------------------------
income | Coefficient Std. err. t P>|t| [95% conf. interval]
------------------------------+----------------------------------------------------------------
hours_worked | 1325.634 234.7095 5.65 0.000 865.4095 1785.858
|
c.hours_worked#c.hours_worked | -2.997316 2.919011 -1.03 0.305 -8.720987 2.726356
|
weeks_worked | 518.4146 169.1275 3.07 0.002 186.7852 850.0441
age | 892.4567 161.6889 5.52 0.000 575.4129 1209.5
|
c.weeks_worked#c.age | -3.857646 3.638934 -1.06 0.289 -10.99296 3.277668
|
_cons | -50439.61 8201.65 -6.15 0.000 -66521.62 -34357.61
-----------------------------------------------------------------------------------------------
Variable | VIF 1/VIF
-------------+----------------------
hours_worked | 10.90 0.091745
c. |
hours_worked#|
c. |
hours_worked | 10.12 0.098829
weeks_worked | 5.34 0.187436
age | 6.86 0.145784
c. |
weeks_worked#|
c.age | 12.10 0.082617
-------------+----------------------
Mean VIF | 9.06
7.3 Corrective Actions
If we find evidence of multicollinearity, we have two basic approaches. We also need to check the other regression assumptions, since a violation of one can lead to a violation of another.
- Combine predictors
- Create simple composite scores, such as sums or (un)weighted means.
- Fit a measurement model (where correlated predictors load on latent variables), extract latent variable estimates (factor scores), and use these in a regular regression model. Note this approach assumes zero measurement error.
- Fit a structural equation model that includes measurement models.
- Drop predictors
- Simply remove predictors not essential to the research question until the VIFs of the focal predictors are decreased. Be aware that this may bias the other coefficient estimates. Note how may affect our conclusions for the other assumptions, especially linearity which assumes our model is complete.
After you have applied any corrections or changed your model in any way, you must re-check this assumption and all of the other assumptions.
Centering variables is often proposed as a remedy for multicollinearity, but it only helps in limited circumstances with polynomial or interaction terms.4 5
Iacobucci, D., Schneider, M. J., Popovich, D. L., & Bakamitsos, G. A. (2016). Mean centering helps alleviate “micro” but not “macro” multicollinearity. Behavior Research Methods, 48, 1308–1317. https://doi.org/10.3758/s13428-015-0624-x↩︎
Olvera Astivia, O. L., & Kroc, E. (2019). Centering in multiple regression does not always reduce multicollinearity: How to tell when your estimates will not benefit from centering. Educational and Psychological Measurement, 79(5), 813–826. https://doi.org/10.1177/0013164418817801↩︎