Complete Data | Listwise Deletion | |||||
---|---|---|---|---|---|---|
Parameter | Estimate | SE | CI | Estimate | SE | CI |
Intercept | -0.11 | 0.09 | -0.29 – 0.07 | -0.33 | 0.10 | -0.53 – -0.13 |
x1i | 1.01 | 0.09 | 0.84 – 1.19 | 0.83 | 0.09 | 0.66 – 1.00 |
x2i | 0.99 | 0.05 | 0.90 – 1.08 | 0.91 | 0.07 | 0.78 – 1.04 |
x3i | 1.00 | 0.05 | 0.91 – 1.09 | 1.01 | 0.07 | 0.88 – 1.14 |
x2i × x3i | 1.03 | 0.04 | 0.95 – 1.10 | 1.02 | 0.07 | 0.89 – 1.15 |
Random Effects | ||||||
σ2 | 0.95 | 0.90 | ||||
τ00 | 0.20 j | 0.20 j | ||||
τ11 | 0.18 j.x1i | 0.11 j.x1i | ||||
ρ01 | 0.59 j | 0.58 j | ||||
ICC | 0.29 | 0.22 | ||||
N | 30 j | 30 j | ||||
Observations | 600 | 383 |
7 Mutilevel
The files for this page’s example analysis are
04_multilevel.R
: the R script that simulates the data and fits frequentist models04_multilevel.csv
: the data file04_multilevel.imp
: the Blimp script04_multilevel.blimp-out
: the Blimp output file
All files are available at this link.
The following issues are discussed:
- Multilevel models
- Missing data mechanisms: MCAR, MAR, and MNAR
7.1 Example Model
The dataset for this example has 600 observations. 13.2% of x1i, 8.5% of x2i, and 23.5% of y are missing. y is MNAR; its missingness depends on its own unobserved values. x1i is MAR; its missingness depends on the values of x2i. x2i is MCAR; its missingness is unrelated to any observed or unobserved values.
This dataset has 30 clusters (j) at level-two, each with 20 observations at level-one, for a total sample size of 600.
Below is a table of estimates from a model predicting y with (1) the complete data and (2) the data after some values in x1i, x2i, and y were made missing, using a listwise deletion procedure. The true effects of each predictor are around 1.
As expected, under MAR and MNAR, estimates are biased. In particular, the estimate for x1i differs by about two standard errors between the two models.
7.2 Blimp Input
The Blimp script for this analysis is as follows:
DATA: 04_multilevel.csv;
CLUSTERID: j;
FIXED: x3i;
MODEL:
focal:
y ~ x1i x2i x3i x2i*x3i | x1i;
auxiliary:
x1i x2i ~ x3i;
y.missing ~ y x1i x2i x3i x2i*x3i;
SEED: 123;
BURN: 20000;
ITERATIONS: 20000;
Select sections of the script are now discussed.
7.2.1 Clustered Data
For a multilevel dataset, specify the cluster variable after the CLUSTERID
command. All variables in the dataset must be numeric, including the cluster variable, so we may need to first encode a string variable as numeric before fitting our model in Blimp.
7.2.2 Random Effects
A random intercept is included by default if there is a cluster. To regress y on x and include a random intercept, we could use this syntax:
CLUSTERID: j;
MODEL: y ~ x;
To add random slopes, include variable names after the |
symbol in a model. To regress y on x and include a random intercept and also a random slope for x across level-two units, our script would have these lines:
CLUSTERID: j;
MODEL: y ~ x | x;
7.2.3 Factored Regression Specification
Here we again see an interaction between two variables, at least one of which is missing. We need to specify our predictor model following the procedure discussed in the logistic regression example. x1i has a higher missing rate than x2i so it appears first in the formula, and x3i is fully observed so it appears on the right side:
x1i x2i ~ x3i;
7.2.4 Selection Model for MNAR
A second auxiliary model included in our script reads:
y.missing ~ y x1i x2i x3i x2i*x3i;
The .missing
suffix creates an ordinal missingness indicator, which means this line fits a probit regression.
This is called a selection model, where variables in the dataset predict the probability of one variable being missing. Most noteworthy of these predictors is y; a variable predicts whether itself is missing! A key assumption of the selection model that makes this happen is that our errors are normally distributed.
In terms of specifying the selection model, overparameterizing the selection model (i.e., including a lot of variables on the right side) is not a problem, but it can lead to convergence issues.1
Here, we knew in advance that y was MNAR. Outside of simulations, we cannot know whether data is MAR or MNAR. These two missing mechanisms make different assumptions about the relationship of a missingness indicator with unobserved values, which is impossible to test without those missing values. Selection models should be used where there is good theoretical reason to do so. Where we are not sure or have competing hypotheses, we can fit models under different assumptions and report any uncertainty in the results.2
7.2.5 Iterations
This model is more complex than those in the previous chapters: over 50 paramerters, random effects, and a selection model with all variables in the dataset. As a result, our chains needs more iterations for our model to converge.
7.3 Blimp Output
Running the Blimp script produces this output file:
---------------------------------------------------------------------------
Blimp
3.2.10
Blimp was developed with funding from Institute of
Education Sciences awards R305D150056 and R305D190002.
Craig K. Enders, P.I. Email: cenders@psych.ucla.edu
Brian T. Keller, Co-P.I. Email: btkeller@missouri.edu
Han Du, Co-P.I. Email: hdu@psych.ucla.edu
Roy Levy, Co-P.I. Email: roy.levy@asu.edu
Programming and Blimp Studio by Brian T. Keller
There is no expressed license given.
---------------------------------------------------------------------------
ALGORITHMIC OPTIONS SPECIFIED:
Imputation method: Fully Bayesian model-based
MCMC algorithm: Full conditional Metropolis sampler with
Auto-Derived Conditional Distributions
Between-cluster imputation model: Latent cluster means (LATENT)
Prior for random effect variances: Zero matrix, df = -(p + 1) (PRIOR2)
Prior for residual variances: Zero sum of squares, df = -2 (PRIOR2)
Prior for predictor variances: Unit sum of squares, df = 2 (XPRIOR1)
Chain Starting Values: Random starting values
BURN-IN POTENTIAL SCALE REDUCTION (PSR) OUTPUT:
NOTE: Split chain PSR is being used. This splits each chain's
iterations to create twice as many chains.
Comparing iterations across 2 chains Highest PSR Parameter #
501 to 1000 1.211 43
1001 to 2000 1.206 43
1501 to 3000 1.082 41
2001 to 4000 1.106 43
2501 to 5000 1.074 51
3001 to 6000 1.055 43
3501 to 7000 1.023 49
4001 to 8000 1.014 26
4501 to 9000 1.030 41
5001 to 10000 1.017 43
5501 to 11000 1.016 41
6001 to 12000 1.017 41
6501 to 13000 1.012 51
7001 to 14000 1.011 43
7501 to 15000 1.009 26
8001 to 16000 1.008 51
8501 to 17000 1.017 43
9001 to 18000 1.025 43
9501 to 19000 1.022 51
10001 to 20000 1.013 38
METROPOLIS-HASTINGS ACCEPTANCE RATES:
Chain 1:
Variable Type Probability Target Value
x1i imputation 0.498 0.500
NOTE: Suppressing printing of 1 chains.
Use keyword 'tuneinfo' in options to override.
DATA INFORMATION:
Level-2 identifier: j
Sample Size: 600
Level-2 Clusters: 30
Missing Data Rates:
y = 23.50
x1i = 13.17
x2i = 08.50
MODEL INFORMATION:
NUMBER OF PARAMETERS
Outcome Models: 25
Predictor Models: 0
PREDICTORS
Fixed variables: x3i
MODELS
focal:
[1] y ~ Intercept x1i x2i x3i x2i*x3i | Intercept x1i
auxiliary:
[2] x1i ~ Intercept x2i x3i
[3] x2i ~ Intercept x3i
[4] y.missing ~ Intercept y x1i x2i x3i x2i*x3i
WARNING MESSAGES:
No warning messages.
MODEL FIT:
INFORMATION CRITERIA
Conditional Likelihood
DIC2 6837.722
WAIC 7167.662
CORRELATIONS AMONG RESIDUALS:
Summaries based on 20000 iterations using 2 chains.
Level-1:
Correlations Median StdDev 2.5% 97.5% PSR N_Eff
-------------------------------------------------------------------
y, x1i 0.000 0.058 -0.112 0.114 1.000 20213.905
y, x2i 0.001 0.058 -0.113 0.113 1.000 19890.143
y, y.missing -0.008 0.057 -0.120 0.104 1.000 17149.392
x1i, x2i 0.001 0.058 -0.113 0.113 1.000 19253.786
x1i, y.missing 0.001 0.058 -0.114 0.114 1.000 19683.078
x2i, y.missing 0.001 0.057 -0.112 0.112 1.000 18028.872
-------------------------------------------------------------------
OUTCOME MODEL ESTIMATES:
Summaries based on 20000 iterations using 2 chains.
focal block:
Outcome Variable: y
Parameters Median StdDev 2.5% 97.5% PSR N_Eff
-------------------------------------------------------------------
Variances:
L2 : Var(Intercept) 0.266 0.114 0.126 0.565 1.001 4787.825
L2 : Cov(x1i,Intercept) 0.145 0.085 0.029 0.359 1.000 4039.526
L2 : Var(x1i) 0.213 0.101 0.090 0.479 1.000 3040.017
Residual Var. 0.941 0.077 0.806 1.108 1.001 2182.799
Coefficients:
Intercept -0.142 0.121 -0.375 0.104 1.007 785.488
x1i 0.950 0.109 0.739 1.175 1.007 1080.673
x2i 0.995 0.064 0.869 1.121 1.001 1506.892
x3i 1.048 0.063 0.924 1.172 1.001 1570.699
x2i*x3i 1.062 0.063 0.939 1.186 1.000 795.070
Standardized Coefficients:
x1i 0.338 0.032 0.271 0.398 1.006 1427.634
x2i 0.344 0.020 0.304 0.383 1.002 3983.370
x3i 0.357 0.020 0.317 0.395 1.002 3750.748
x2i*x3i 0.397 0.025 0.352 0.448 1.005 791.280
Proportion Variance Explained
by Coefficients 0.822 0.021 0.772 0.854 1.001 3263.953
by Level-2 Random Intercepts 0.033 0.013 0.016 0.068 1.001 4853.665
by Level-2 Random Slopes 0.027 0.012 0.012 0.058 1.000 3607.067
by Level-1 Residual Variation 0.115 0.011 0.096 0.138 1.001 1735.684
-------------------------------------------------------------------
auxiliary block:
Outcome Variable: x1i
Parameters Median StdDev 2.5% 97.5% PSR N_Eff
-------------------------------------------------------------------
Variances:
L2 : Var(Intercept) 0.008 0.010 0.000 0.038 1.003 744.775
Residual Var. 0.782 0.049 0.693 0.886 1.000 9679.504
Coefficients:
Intercept 0.025 0.043 -0.061 0.109 1.000 7721.251
x2i 0.330 0.043 0.245 0.417 1.000 6697.415
x3i 0.294 0.043 0.210 0.377 1.000 6863.699
Standardized Coefficients:
x2i 0.319 0.039 0.241 0.394 1.000 6886.435
x3i 0.280 0.039 0.202 0.354 1.000 7003.486
Proportion Variance Explained
by Coefficients 0.242 0.031 0.181 0.303 1.000 7237.416
by Level-2 Random Intercepts 0.007 0.010 0.000 0.036 1.003 729.177
by Level-1 Residual Variation 0.748 0.032 0.685 0.810 1.000 4840.951
-------------------------------------------------------------------
Outcome Variable: x2i
Parameters Median StdDev 2.5% 97.5% PSR N_Eff
-------------------------------------------------------------------
Variances:
L2 : Var(Intercept) 0.009 0.011 0.000 0.042 1.002 860.894
Residual Var. 0.869 0.053 0.776 0.980 1.000 12574.055
Coefficients:
Intercept -0.040 0.044 -0.126 0.048 1.000 10571.698
x3i 0.340 0.041 0.259 0.419 1.001 4730.902
Standardized Coefficients:
x3i 0.333 0.037 0.257 0.401 1.001 4387.780
Proportion Variance Explained
by Coefficients 0.111 0.024 0.066 0.161 1.001 4673.270
by Level-2 Random Intercepts 0.009 0.011 0.000 0.041 1.002 850.700
by Level-1 Residual Variation 0.878 0.026 0.824 0.925 1.000 2439.786
-------------------------------------------------------------------
Latent Variable: y.missing
Parameters Median StdDev 2.5% 97.5% PSR N_Eff
-------------------------------------------------------------------
Variances:
L2 : Var(Intercept) 0.089 0.088 0.006 0.336 1.007 700.322
Residual Var. 1.000 0.000 1.000 1.000 nan nan
Coefficients:
Intercept -1.312 0.153 -1.657 -1.051 1.003 415.924
y 0.547 0.163 0.224 0.874 1.001 320.824
x1i 0.192 0.188 -0.178 0.557 1.000 543.809
x2i 0.016 0.206 -0.364 0.439 1.004 321.074
x3i -0.134 0.188 -0.509 0.240 1.001 385.638
x2i*x3i -0.339 0.227 -0.837 0.058 1.011 261.477
Standardized Coefficients:
y 0.844 0.221 0.355 1.221 1.000 331.282
x1i 0.106 0.105 -0.091 0.316 1.000 508.420
x2i 0.009 0.108 -0.183 0.238 1.003 333.876
x3i -0.071 0.096 -0.245 0.131 1.001 403.082
x2i*x3i -0.199 0.115 -0.420 0.036 1.011 286.297
Proportion Variance Explained
by Coefficients 0.673 0.054 0.564 0.773 1.002 329.050
by Level-2 Random Intercepts 0.026 0.024 0.002 0.091 1.005 684.396
by Level-1 Residual Variation 0.295 0.050 0.204 0.396 1.007 326.634
-------------------------------------------------------------------
Select sections of the output are now discussed.
7.3.1 Outcome Models
This model, with this simulated dataset, with this seed, did remarkably well at recovering the true estimates. Notice all those qualifiers. In preparing these materials, many iterations of these models were tested and reconfigured.
Did every model produce posterior parameter distributions with median 1? No.
Were the models almost always more accurate than the listwise deletion model? Yes.
It was not just the more obvious parameter estimates that moved closer to their true values, but also parameters like the random effect variances. In some cases, R had issues with singular fit where the random intercept and slope correlation was a perfect 1.00, but Blimp actually recovered the true correlation (~ 0.5).
7.4 Exercise
Fit a multilevel model with the dataset exercise_04.csv
. (All files are available at this link.) Regress y on x1, x2, and x3. You have reason to believe x1i is missing not at random and that x2i is missing conditional on x3i. The groups are given as j. Allow the slope of x3i to vary across groups. The coefficients on all three predictors should be around 1.