5 Homoscedasticity
What this assumption means: The residuals have equal variance (homoscedasticity) for every value of the fitted values and of the predictors.
Why it matters: Homoscedasticity is necessary to calculate accurate standard errors for parameter estimates.
How to diagnose violations: Visually check plots of residuals against fitted values or predictors for constant variance, and use the BreuschPagan test against heteroscedaticity (nonconstant variance).
How to address it: Modify the model, fit a generalized linear model, or run a weighted least squares regression.
5.1 Example Model
If you have not already done so, download the example dataset, read about its variables, and import the dataset into Stata.
Then, use the code below to fit this page’s example model.
use acs2019sample, clear
reg weeks_worked age hours_worked commute_time i.education
Source  SS df MS Number of obs = 2,316
+ F(8, 2307) = 53.24
Model  35211.0279 8 4401.37849 Prob > F = 0.0000
Residual  190738.537 2,307 82.6781694 Rsquared = 0.1558
+ Adj Rsquared = 0.1529
Total  225949.565 2,315 97.6024038 Root MSE = 9.0928

weeks_worked  Coefficient Std. err. t P>t [95% conf. interval]
+
age  .0474136 .012662 3.74 0.000 .0225835 .0722436
hours_worked  .2584471 .0148671 17.38 0.000 .2292928 .2876014
commute_time  .0037508 .0086126 0.44 0.663 .02064 .0131385

education 
High school  4.762103 .8713987 5.46 0.000 3.053297 6.47091
Some college  3.552516 .8966919 3.96 0.000 1.79411 5.310922
Associate'..  5.876652 .958106 6.13 0.000 3.997813 7.755491
Bachelor's..  4.164039 .9040732 4.61 0.000 2.391158 5.93692
Advanced d..  3.997863 1.026043 3.90 0.000 1.9858 6.009926

_cons  32.34336 1.006814 32.12 0.000 30.36901 34.31772

5.2 Statistical Tests
Use the BreuschPagan test to assess homoscedasticity. The BreuschPagan test regresses the residuals on the fitted values or predictors and checks whether they can explain any of the residual variance. A small pvalue, then, indicates that residual variance is nonconstant (heteroscedastic).
Run BreuschPagan test with estat hettest
. The default is to regress the residuals on the fitted values.
estat hettest
Breusch–Pagan/Cook–Weisberg test for heteroskedasticity
Assumption: Normal error terms
Variable: Fitted values of weeks_worked
H0: Constant variance
chi2(1) = 993.46
Prob > chi2 = 0.0000
The small pvalue leads us to reject the null hypothesis of homoscedasticity and infer that the error variance is nonconstant.
After estat hettest
, we can specify one or more variables to test whether the variance is nonconstant for these terms.
estat hettest commute_time
estat hettest age hours_worked
Breusch–Pagan/Cook–Weisberg test for heteroskedasticity
Assumption: Normal error terms
Variable: commute_time
H0: Constant variance
chi2(1) = 1.88
Prob > chi2 = 0.1698
Breusch–Pagan/Cook–Weisberg test for heteroskedasticity
Assumption: Normal error terms
Variables: age hours_worked
H0: Constant variance
chi2(2) = 910.62
Prob > chi2 = 0.0000
We failed to reject homoscedasticity for commute_time
alone, but we would reject it for a combination of age
and hours_worked
.
5.3 Visual Tests
A classic example of heteroscedasticity is a fan shape. We often see this pattern when predicting income by age, or some outcome by time in longitudinal data, where variance increases with our predictor.
Heteroscedasticity can follow other patterns too, such as constantly decreasing variance, or variance that increases then decreases then increases again.
It can also exist when variance is unequal across groups (categorical predictors):
To check the assumption of homoescedasticity visually, first add variables of fitted values and of the square root of the absolute value of the standardized residuals (\(\sqrt{\lvert standardized \; residuals \rvert}\)) to the dataset.
We can then create a scalelocation plot, where a violation of homoscedasticity is indicated by a nonflat fitted line. Because we forced all the residuals to be positive by taking their absolute value, instead of looking for whether the band of points is wider or narrow (variance is larger or smaller) at each value of \(x\), we simply look for whether the line goes up or down.
We must plot the residuals against the fitted values and against each of the predictors.
predict yhat
predict res_std, rstandard
gen res_sqrt = sqrt(abs(res_std))
(option xb assumed; fitted values)
(2,684 missing values generated)
(2,684 missing values generated)
(2,684 missing values generated)
Plot res_sqrt
against the fitted values.
The residual variance is decidedly nonconstant across the fitted values since the conditional mean line goes up and down, suggesting that the assumption of homoscedasticity has been violated.
scatter res_sqrt yhat  ///
lowess res_sqrt yhat, ///
legend(off)
The residual variance is decidedly nonconstant across the fitted values since the conditional mean line goes up and down, suggesting that the assumption of homoscedasticity has been violated. This matches the conclusion we would draw from the BreuschPagan test earlier.
Check the residuals against each predictor. We will just check commute_time
, which had a nonsignificant pvalue in our test earlier.
scatter res_sqrt commute_time  ///
lowess res_sqrt commute_time, ///
legend(off)
Here, the line is relatively flat, meaning we failed to find evidence of heteroscedasticity. We made the same conclusion earlier with the BreuschPagan test where we regressed the residuals on commute_time
.
Now, check the residual variance against a categorical predictor, education
.
Adding a conditional mean line with a categorical variable requires us to treat the variable as numeric:
scatter res_sqrt education  ///
lowess res_sqrt education, ///
legend(off)
The line is not flat, indicating heteroscedasticity across the levels of education.
5.4 Corrective Actions
To address violations of the assumption of homoscedasticity, try the following:
 Check the other regression assumptions, since a violation of one can lead to a violation of another.
 Modify the model formula by adding or dropping variables or interaction terms.
 Fit a generalized linear model.
 Instead of ordinary least squares regression, use weighted least squares.
After you have applied any corrections or changed your model in any way, you must recheck this assumption and all of the other assumptions.