What this assumption means: After taking into account the predictors in our model, there is no remaining association between observations.
Why it matters: Omitting variables can bias parameter estimates and standard errors.
How to diagnose violations: Check for associations between residuals and variables not included in the model, including temporal, spatial, and other grouping variables. Inquire about the sampling and data collection methods.
How to address it: Add predictors to the model. Fit a model that takes data structure into account (e.g., panel, time series, multilevel).
Statistical and visual tests of independence are usually insufficient. Instead, ask a series of questions about the dataset:
- How was the sample chosen?
- How was the data collected?
- Are any individuals (households, countries, etc.) repeated in the dataset?
- Is it possible that any observations are related temporally?
- Is it possible that any observations are related spatially?
- Is it possible that any observations are related by other, possibly not fully understood, means?
You may find that some observations are related, either through the study design (sampling and data collection methods, or repeated measures design) or naturally (close together in time, space, or other ways). If so, the assumption of independence has been violated.
- Check the other regression assumptions, since a violation of one can lead to a violation of another.
- Modify the model by adding predictors or interactions.
- The assumption of independence is that the residuals are independent after conditioning on \(X\), so be sure that \(X\) is complete.
After you have applied any corrections or changed your model in any way, you must re-check this assumption and all of the other assumptions.