5 Homoscedasticity

What this assumption means: The residuals have equal variance (homoscedasticity) for every value of the fitted values and of the predictors.

Why it matters: Homoscedasticity is necessary to calculate accurate standard errors for parameter estimates.

How to diagnose violations: Visually check plots of residuals against fitted values or predictors for constant variance, and use the Breusch-Pagan test against heteroscedaticity (non-constant variance).

How to address it: Modify the model, fit a generalized linear model, or run a weighted least squares regression.

5.1 Example Model

If you have not already done so, download the example dataset, read about its variables, and import the dataset into Stata.

Then, use the code below to fit this page’s example model.

use acs2019sample, clear
reg weeks_worked age hours_worked commute_time i.education
      Source |       SS           df       MS      Number of obs   =     2,316
-------------+----------------------------------   F(8, 2307)      =     53.24
       Model |  35211.0279         8  4401.37849   Prob > F        =    0.0000
    Residual |  190738.537     2,307  82.6781694   R-squared       =    0.1558
-------------+----------------------------------   Adj R-squared   =    0.1529
       Total |  225949.565     2,315  97.6024038   Root MSE        =    9.0928

-------------------------------------------------------------------------------
 weeks_worked | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
--------------+----------------------------------------------------------------
          age |   .0474136    .012662     3.74   0.000     .0225835    .0722436
 hours_worked |   .2584471   .0148671    17.38   0.000     .2292928    .2876014
 commute_time |  -.0037508   .0086126    -0.44   0.663      -.02064    .0131385
              |
    education |
 High school  |   4.762103   .8713987     5.46   0.000     3.053297     6.47091
Some college  |   3.552516   .8966919     3.96   0.000      1.79411    5.310922
Associate'..  |   5.876652    .958106     6.13   0.000     3.997813    7.755491
Bachelor's..  |   4.164039   .9040732     4.61   0.000     2.391158     5.93692
Advanced d..  |   3.997863   1.026043     3.90   0.000       1.9858    6.009926
              |
        _cons |   32.34336   1.006814    32.12   0.000     30.36901    34.31772
-------------------------------------------------------------------------------

5.2 Statistical Tests

Use the Breusch-Pagan test to assess homoscedasticity. The Breusch-Pagan test regresses the residuals on the fitted values or predictors and checks whether they can explain any of the residual variance. A small p-value, then, indicates that residual variance is non-constant (heteroscedastic).

Run Breusch-Pagan test with estat hettest. The default is to regress the residuals on the fitted values.

estat hettest
Breusch–Pagan/Cook–Weisberg test for heteroskedasticity 
Assumption: Normal error terms
Variable: Fitted values of weeks_worked

H0: Constant variance

    chi2(1) = 993.46
Prob > chi2 = 0.0000

The small p-value leads us to reject the null hypothesis of homoscedasticity and infer that the error variance is non-constant.

After estat hettest, we can specify one or more variables to test whether the variance is non-constant for these terms.

estat hettest commute_time
estat hettest age hours_worked
Breusch–Pagan/Cook–Weisberg test for heteroskedasticity 
Assumption: Normal error terms
Variable: commute_time

H0: Constant variance

    chi2(1) =   1.88
Prob > chi2 = 0.1698


Breusch–Pagan/Cook–Weisberg test for heteroskedasticity 
Assumption: Normal error terms
Variables: age hours_worked

H0: Constant variance

    chi2(2) = 910.62
Prob > chi2 = 0.0000

We failed to reject homoscedasticity for commute_time alone, but we would reject it for a combination of age and hours_worked.

5.3 Visual Tests

A classic example of heteroscedasticity is a fan shape. We often see this pattern when predicting income by age, or some outcome by time in longitudinal data, where variance increases with our predictor.

Heteroscedasticity can follow other patterns too, such as constantly decreasing variance, or variance that increases then decreases then increases again.

It can also exist when variance is unequal across groups (categorical predictors):

To check the assumption of homoescedasticity visually, first add variables of fitted values and of the square root of the absolute value of the standardized residuals (\(\sqrt{\lvert standardized \; residuals \rvert}\)) to the dataset.

We can then create a scale-location plot, where a violation of homoscedasticity is indicated by a non-flat fitted line. Because we forced all the residuals to be positive by taking their absolute value, instead of looking for whether the band of points is wider or narrow (variance is larger or smaller) at each value of \(x\), we simply look for whether the line goes up or down.

We must plot the residuals against the fitted values and against each of the predictors.

predict yhat
predict res_std, rstandard
gen res_sqrt = sqrt(abs(res_std))
(option xb assumed; fitted values)
(2,684 missing values generated)

(2,684 missing values generated)

(2,684 missing values generated)

Plot res_sqrt against the fitted values.

The residual variance is decidedly non-constant across the fitted values since the conditional mean line goes up and down, suggesting that the assumption of homoscedasticity has been violated.

scatter res_sqrt yhat || ///
    lowess res_sqrt yhat, ///
    legend(off)

The residual variance is decidedly non-constant across the fitted values since the conditional mean line goes up and down, suggesting that the assumption of homoscedasticity has been violated. This matches the conclusion we would draw from the Breusch-Pagan test earlier.

Check the residuals against each predictor. We will just check commute_time, which had a non-significant p-value in our test earlier.

scatter res_sqrt commute_time || ///
    lowess res_sqrt commute_time, ///
    legend(off)

Here, the line is relatively flat, meaning we failed to find evidence of heteroscedasticity. We made the same conclusion earlier with the Breusch-Pagan test where we regressed the residuals on commute_time.

Now, check the residual variance against a categorical predictor, education.

Adding a conditional mean line with a categorical variable requires us to treat the variable as numeric:

scatter res_sqrt education || ///
    lowess res_sqrt education, ///
    legend(off)

The line is not flat, indicating heteroscedasticity across the levels of education.

5.4 Corrective Actions

To address violations of the assumption of homoscedasticity, try the following:

  • Check the other regression assumptions, since a violation of one can lead to a violation of another.
  • Modify the model formula by adding or dropping variables or interaction terms.
  • Fit a generalized linear model.
  • Instead of ordinary least squares regression, use weighted least squares.

After you have applied any corrections or changed your model in any way, you must re-check this assumption and all of the other assumptions.