Panel Models in Stata and R
1 Introduction
The purpose of this page is to help you take panel models you fit in Stata, and fit them in R, and to understand why standard errors (SEs) differ between the two. You will have limited success trying to translate panel models in the other direction, from R to Stata, because Stata package authors are less likely than R package authors to explicitly reproduce methods unique to other software packages.
The example code in the tables below are written with Stata-like terminology. They assume you have some dataset dat
with panel variable panelvar
, time variable timevar
, dependent variable depvar
, any number of independent variables indepvars
, and some other group variable groupvar
. Substitute each of these with the names of the variables in your particular dataset.
The functions in the R code require you to install and load the plm
, coeftest
, sandwich
, and clubSandwich
packages.
2 Panel Models Equivalents
2.1 Fixed effects
Stata:
xtset panelvar
xtreg depvar indepvars, fe
R:
mod <-
plm(depvar ~ indepvars,
dat,
index = "panelvar",
model = "within")
2.1.1 SEs clustered by panelvar
Stata:
xtset panelvar
xtreg depvar indepvars, fe vce(cluster panelvar)
R:
mod <-
plm(depvar ~ indepvars,
dat,
index = "panelvar",
model = "within")
n_groups <- length(unique(dat$panelvar))
adj <- n_groups / (n_groups - 1)
coeftest(mod,
adj * vcovHC(mod, type = "HC1"))
See notes on finite sample size adjustments and degrees of freedom.
2.1.2 SEs clustered by groupvar
Stata:
xtset panelvar
xtreg depvar indepvars, fe vce(cluster groupvar)
R:
mod <-
plm(depvar ~ indepvars,
dat,
index = "panelvar",
model = "within")
coeftest(mod,
vcovCR(mod,
type = "CR1S",
cluster = dat$groupvar))
See notes on finite sample size adjustments, SEs clustered by groupvar
, and degrees of freedom.
2.2 Random effects
2.2.1 Balanced
Stata:
xtset panelvar
xtreg depvar indepvars, re
R:
mod <-
plm(depvar ~ indepvars,
dat,
index = "panelvar",
model = "random")
2.2.2 Unbalanced
Stata:
xtset panelvar
xtreg depvar indepvars, re sa
R:
mod <-
plm(depvar ~ indepvars,
dat,
index = "panelvar",
model = "random",
random.models = c("within", "between"))
R’s default is the Swamy and Arora model, which can be done in Stata with the sa
option.
2.2.3 SEs clustered by panelvar
Stata:
xtset panelvar
xtreg depvar indepvars, re vce(cluster panelvar)
R:
mod <-
plm(depvar ~ indepvars,
dat,
index = "panelvar",
model = "random")
coeftest(mod,
vcovHC(mod,
type = "sss"))
See note on finite sample size adjustments.
2.2.4 SEs clustered by groupvar
Stata:
xtset panelvar
xtreg depvar indepvars, re vce(cluster groupvar)
R has no equivalent.
See note on SEs clustered by groupvar
.
3 Doing More
3.1 Including timevar
In Stata, timevar
is included in the initial xtset
: xtset panelvar timevar
.
In R, timevar
must be added to the index
argument of plm()
. Supply index
with a vector of panelvavr
and timevar
: plm(..., index = c("panelvar", "timevar"))
.
3.2 Including Multiple Fixed Effects
If you are fitting a model with many fixed effects with reghdfe
, see the R package lfe
, but note that the package is no longer being maintained.
4 Notes
4.1 Finite sample size adjustments
Stata’s xtreg
applies a correction to standard errors for finite sample sizes, while R does not. Applying some adjustment factor, such as \(\frac{\text{n_groups}}{\text{n_groups} - 1}\), will make R’s SEs the same as, or at least very close to, Stata’s SEs.
reghdfe
, on the other hand, produces the same SEs as plm()
, so that and
are equivalent. Note that reghdfe
only supports fixed effects models, however.
reghdfe
produces SEs identical to plm
’s default.
As an alternative for fixed effects models, use reghdfe
4.2 SEs clustered by groupvar
Fixed effects models: I have not been able to figure out why the SEs slightly differ for Stata and R, even though it appears they are applying the same adjustment to the SEs.
Random effects models: As of this writing, plm
, sandwich
, and clubSandwich
do not support clustering SEs by groups that were not included in the random effects panel model.
4.3 Degrees of freedom
Stata and R use different degrees of freedom for clustered standard errors. While the SEs and t-values will match, the p-values and confidence intervals will not. Stata uses the number of groups minus one, and R uses the number of observations minus the number of groups minus the number of predictors in the model.
To manually calculate Stata’s and R’s p-values for some t-value (tvalue
), adapt the code below.
g <- length(unique(dat$panelvar))
n <- nobs(mod)
k <- length(coef(mod))
df_stata <- g - 1
df_r <- n - g - k
pt(abs(tvalue), df_stata, lower.tail = F) * 2 # Stata's p-value
pt(abs(tvalue), df_r, lower.tail = F) * 2 # R's p-value