# 1 Setup

This article will teach you how to use `ggpredict()` and `plot()` to visualize the marginal effects of one or more variables of interest in linear and logistic regression models. You will learn how to specify predictor values and how to fix covariates at specific values, in addition to options for customizing plots.

Marginal means are predicted outcomes given certain constraints, and a marginal effect is the predicted change in the outcome after varying a variable of interest while holding others constant.

As our models grow in complexity and dimensionality, we face increasing difficulty in interpreting coefficients. Visualizing margins helps us better understand and communicate our model results.

To begin, load the `ggeffects` and `ggplot2` libraries. `ggeffects` has the `ggpredict()` function, which we will use to calculate margins, and `ggplot2` has other plotting functions.

``````library(ggeffects)
library(ggplot2)``````

Load a sample of the 2019 American Community Survey. The variables in this dataset are described in Regression Diagnostics with R.

``acs <- readRDS(url("https://sscc.wisc.edu/sscc/pubs/data/RegDiag/acs2019sample.rds"))``

Fit a linear model predicting `income` from `age`, `education`, the interaction of the two, `hours_worked` per week, and `sex`.

``mod <- lm(income ~ age * education + hours_worked + sex, acs)``

Now, suppose we want to understand the marginal effect of `age` for females with a bachelorâ€™s degree who work 40 hours per week.

View the model estimates with `summary()`:

``summary(mod)``
``````##
## Call:
## lm(formula = income ~ age * education + hours_worked + sex, data = acs)
##
## Residuals:
##     Min      1Q  Median      3Q     Max
## -104651  -20303   -5457   10053  575865
##
## Coefficients:
##                                  Estimate Std. Error t value Pr(>|t|)
## (Intercept)                     -14651.75    7303.23  -2.006 0.044933 *
## age                                321.88     188.29   1.709 0.087479 .
## educationHigh school             -2538.18    8806.58  -0.288 0.773204
## educationSome college           -11156.06    9145.48  -1.220 0.222629
## educationAssociate's degree       6956.78   12208.20   0.570 0.568829
## educationBachelor's degree       -2617.29   10355.82  -0.253 0.800491
## educationAdvanced degree         43542.41   15015.65   2.900 0.003764 **
## hours_worked                      1061.12      72.49  14.637  < 2e-16 ***
## sexFemale                       -15115.84    1966.42  -7.687 2.08e-14 ***
## age:educationHigh school           174.06     213.52   0.815 0.415029
## age:educationSome college          505.64     223.04   2.267 0.023466 *
## age:educationAssociate's degree    154.85     280.84   0.551 0.581415
## age:educationBachelor's degree     813.06     246.26   3.302 0.000973 ***
## age:educationAdvanced degree       329.19     316.10   1.041 0.297781
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 49320 on 2747 degrees of freedom
##   (2239 observations deleted due to missingness)
## Multiple R-squared:  0.2391, Adjusted R-squared:  0.2355
## F-statistic: 66.39 on 13 and 2747 DF,  p-value: < 2.2e-16``````

Our model estimates give us the coefficient estimates and significance tests we have been trained to interpret and report, but they are less useful when we want to calculate expected outcomes.

`ggpredict()` with `plot()` provides a visual aid for select terms. The two are complementary; neither can fully replace the other.

``````ggpredict(mod,
terms = "age",
condition = c(education = "Bachelor's degree",
hours_worked = 40,
sex = "Female")) |>
plot()``````

This plot helps us visualize the marginal effect of age on income when we hold `education`, `hours_worked`, and `sex` at specific values. The expected increase in income with age appears to be quite substantial.

# 2 Plotting Margins

We will continue to plot margins from `mod`, our regression model fit to the `acs` dataset.

We will use two functions to create margins plots: `ggpredict()` and `plot()`. `ggeffects` has an additional method for `plot()` to create margins plots with `ggplot2`.

Two arguments of `ggpredict()` that we will use are `model` and `terms`. `model` is just the name of our fitted model, `mod`.

`terms` takes a character vector of up to four predictor names, but here we will only use three. The three terms are mapped to the ggplot aesthetics of `x`, `group`, and `facet`, in that order.

• `x` is mapped to the x-axis
• `group`s have different lines, colors, and/or shapes
• `facet`s are separate sub-plots

The three terms take different types of variables.

• `x` can be continuous or categorical
• `group` and `facet` must be categorical, or they will be made categorical
• See Margins at Specific Values to specify how continuous variables are handled (e.g., by using specific values, mean Â± SD, etc.)

## 2.1 Terms

Our model has two continuous (`age`, `hours_worked`) and two categorical variables (`education`, `sex`). We can create a range of plots with different numbers and orders of these variables.

One continuous:

``ggpredict(mod, terms = c("age")) |> plot()``

One categorical:

``ggpredict(mod, terms = c("education")) |> plot()``

Two continuous:

``ggpredict(mod, terms = c("age", "hours_worked")) |> plot()``

Two categorical:

``ggpredict(mod, terms = c("education", "sex")) |> plot(colors = "bw")``

One continuous then one categorical:

``ggpredict(mod, terms = c("age", "sex")) |> plot()``

One categorical then one continuous. Compare this plot to the previous plot, noticing how `age` is turned into a categorical variable since it is the `group` term.

``ggpredict(mod, terms = c("sex", "age")) |> plot()``