Interactive mode: click a code block or Show Plot button to reveal/hide its corresponding plot.
library(haven)
library(tidyverse)
df<-read_dta("CMPS_2016.dta")
After fitting a model, it is useful generate model-based estimates (expected values, or adjusted predictions) of the response variable for different combinations of predictor values. Such estimates can be used to make inferences about relationships between variables - adjusted predictions tell you: what is the expected ouctome for certain values or levels of my predictors?
ggeffects is a handy package that aims at easily calculating adjusted predictions at meaningful values of covariates from statistical models.
#if (FALSE) install.packages("ggeffects")
library(ggeffects)
df$C246
## <labelled<double>[10144]>: Immigrants (How much discrimination is there in the United States today against
## [1] 5 5 1 1 1 1 1 2 1 2 1 3 1 2 1 2 1 1 1 1 3 1 1 2 2 1 3 2 2 5 2 1 3 5 5 1 1 2 1 1 3 1 3 1 3 2 3
## [48] 1 2 2 3 1 1 2 2 5 1 5 1 4 1 1 1 3 2 1 5 1 1 1 3 1 1 5 1 3 2 1 2 1 3 1 3 1 5 2 2 1 2 3 2 1 1 1
## [95] 3 5 2 3 2 5 3 4 3 5 2 3 2 1 2 1 1 2 3 1 5 2 2 3 2 1 1 1 2 1 2 5 2 3 2 3 1 5 1 2 1 1 1 1 2 5 2
## [142] 2 2 1 1 5 2 2 1 5 1 3 5 1 1 1 4 3 1 1 2 1 5 1 1 3 3 1 1 4 1 1 1 1 1 1 1 2 1 1 1 1 3 5 2 5 2 1
## [189] 1 2 5 5 3 2 2 2 3 3 1 3 2 1 2 2 1 1 2 1 1 1 1 2 4 1 2 1 1 1 1 1 3 2 2 5 2 1 1 1 1 1 4 3 4 1 2
## [236] 2 1 1 3 1 5 1 1 3 2 2 3 2 5 1 1 1 4 3 1 4 1 1 3 4 1 3 4 4 2 3 1 2 1 1 5 5 3 1 1 1 2 2 3 3 3 5
## [283] 1 1 2 1 2 2 1 1 1 1 2 1 1 2 5 1 2 1 2 5 1 1 1 1 1 2 2 2 1 3 1 2 1 1 2 1 1 2 2 1 1 2 1 1 1 1 1
## [330] 3 1 1 3 2 1 1 2 2 2 5 1 2 1 4 1 2 2 4 2 3 3 3 2 1 1 4 2 3 2 1 1 2 5 1 1 1 2 3 1 2 2 1 4 1 2 3
## [377] 1 2 1 1 1 3 3 2 2 2 3 3 2 2 2 2 2 1 5 3 1 2 2 1 1 1 1 1 3 2 2 2 3 1 1 2 2 2 3 3 3 1 2 1 4 1 2
## [424] 1 3 3 1 2 2 1 2 2 1 1 1 5 1 3 5 2 2 5 5 1 1 1 1 2 2 1 1 2 2 3 1 1 1 5 5 2 3 1 1 1 4 2 2 1 4 1
## [471] 5 2 1 1 1 1 5 3 1 2 1 1 1 5 1 2 1 2 1 2 3 2 1 2 2 1 1 1 5 1 2 2 1 2 1 1 1 1 1 1 2 2 3 1 2 1 1
## [518] 1 2 1 1 1 2 1 2 5 1 1 1 2 3 3 1 5 2 1 1 1 2 2 1 5 2 1 2 2 4 4 5 2 1 1 2 1 1 1 5 2 1 3 1 2 1 1
## [565] 2 1 1 2 1 1 2 1 3 1 3 1 2 3 1 1 1 1 3 2 2 1 2 1 1 1 1 3 1 1 1 1 1 1 2 1 1 1 3 1 1 5 3 1 1 1 3
## [612] 2 1 2 1 2 3 1 1 1 2 1 1 1 2 2 1 3 5 1 1 1 2 1 3 1 2 3 2 3 1 1 3 1 1 1 2 1 2 1 1 1 1 2 1 1 1 3
## [659] 2 1 3 1 2 1 1 3 1 2 5 1 1 1 1 3 2 3 2 1 1 1 1 5 2 3 1 1 1 1 2 2 2 2 1 1 3 3 1 1 2 2 3 1 2 1 5
## [706] 3 1 3 1 4 1 1 1 3 1 2 1 5 3 1 1 1 2 1 1 4 1 1 1 5 1 1 1 5 1 1 2 1 1 4 2 1 4 5 2 1 1 1 1 2 2 5
## [753] 3 1 5 1 3 3 1 1 1 2 1 2 1 1 1 2 5 2 2 1 1 1 2 1 3 1 1 1 1 1 1 1 2 1 1 1 2 1 1 3 1 1 5 3 2 3 1
## [800] 1 1 3 1 2 2 1 1 3 1 3 1 2 1 2 1 1 1 1 1 1 1 1 2 5 1 1 1 2 5 1 1 2 3 1 1 2 1 5 1 1 1 2 1 5 1 5
## [847] 2 1 1 5 1 2 1 3 1 1 1 1 1 5 1 1 1 2 1 1 1 2 1 1 2 1 1 1 1 3 1 1 5 1 1 1 2 2 1 2 5 1 1 5 2 1 1
## [894] 1 2 2 5 1 1 2 1 3 2 1 2 1 2 1 1 1 1 1 1 1 2 5 2 1 1 1 3 2 5 3 1 1 2 1 3 1 4 5 1 2 1 1 1 3 3 1
## [941] 1 1 2 2 1 1 1 1 2 1 1 3 5 4 2 2 2 1 1 2 1 5 1 1 1 1 1 1 1 1 5 1 1 1 1 1 5 2 1 3 1 1 4 1 2 3 2
## [988] 3 1 1 1 1 1 1 1 2 4 2 1 1 5 1 1 1 1 1 1 1 1 1 2 2 1 1 3 1 2 3 2 4 5 1 1 1 1 4 5 1 1 5 1 5 1 3
## [1035] 1 1 2 5 1 3 1 1 1 3 3 5 2 1 5 1 2 1 1 5 3 2 3 1 1 1 1 4 5 3 1 1 2 2 1 2 1 1 1 1 1 2 1 1 5 1 1
## [1082] 1 1 1 2 1 2 1 1 1 1 2 1 1 2 1 1 4 3 1 5 1 1 4 1 5 1 1 3 1 1 2 5 4 1 3 2 1 1 1 1 3 2 5 2 5 2 1
## [1129] 1 1 2 5 1 2 5 1 1 2 5 2 3 1 1 5 1 1 3 1 4 2 2 1 1 2 1 1 1 2 3 1 3 1 2 1 1 1 1 3 2 1 2 5 5 5 4
## [1176] 1 1 1 1 2 2 1 2 1 1 2 1 2 2 3 2 2 1 1 2 1 1 2 1 2
## [ reached 'max' / getOption("max.print") -- omitted 8944 entries ]
##
## Labels:
## value label
## 1 A lot
## 2 Some
## 3 A little
## 4 None at all
## 5 Don't know
data <- df %>%
filter(C246<5)%>%
mutate(
discrimination = 4 - C246,
sex = case_when(
S3 == 1 ~ "Female",
S3 == 2 ~ "Male",
TRUE ~ NA_character_
),
White = ifelse(S2_1==1,"White","Minority"),
age = AGE
)
# Convert 'race' to a factor and set levels
data$White <- factor(data$White,
levels = c("White",
"Minority"))
data$sex <- factor(data$sex)
After fitting a model, it’s often helpful to estimate how the
response variable changes for different levels of a categorical
predictor. These predictions allow us to assess the adjusted
relationship between a categorical variable and the outcome, holding
other predictors constant. For example, we might want to know how
discrimination scores differ by race or sex while controlling for other
factors such as age. Using ggeffects, we can calculate and
visualize these predictions efficiently.
model1<-glm(discrimination~sex+White+age,data=data)
summary(model1)
##
## Call:
## glm(formula = discrimination ~ sex + White + age, data = data)
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2.3919348 0.0352811 67.797 < 2e-16 ***
## sexMale -0.2149141 0.0177212 -12.128 < 2e-16 ***
## WhiteMinority 0.1526756 0.0261176 5.846 5.21e-09 ***
## age -0.0030184 0.0005487 -5.501 3.87e-08 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for gaussian family taken to be 0.6633938)
##
## Null deviance: 6418.1 on 9434 degrees of freedom
## Residual deviance: 6256.5 on 9431 degrees of freedom
## (23 observations deleted due to missingness)
## AIC: 22909
##
## Number of Fisher Scoring iterations: 2
model1_pred<-ggpredict(model1,"White",ci_level = 0.95)
plot(model1_pred)
plot(model1_pred)+ylim(2,2.5)+theme_classic()
In this section, we explore how a continuous predictor influences the response variable. Adjusted predictions allow us to understand the expected outcome across a range of values for the continuous variable, holding other factors constant.
model1_pred2<-ggpredict(model1,"age",ci_level = 0.95)
plot(model1_pred2)+ylim(1.8,2.8)
Interactions between predictors can reveal more complex relationships, where the effect of one predictor depends on the level of another. Visualizing predicted values in the context of interactions helps us interpret these conditional effects effectively.
Interactions between categorical and continuous predictors allow us to assess whether the relationship between the continuous variable and the outcome varies across groups. For example, we can examine whether the effect of age on discrimination differs by sex.
model2<-lm(discrimination~sex*age+White,data=data)
summary(model2)
##
## Call:
## lm(formula = discrimination ~ sex * age + White, data = data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.5050 -0.4199 0.4988 0.6208 1.0230
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2.4176445 0.0387774 62.347 < 2e-16 ***
## sexMale -0.2898118 0.0501322 -5.781 7.66e-09 ***
## age -0.0036970 0.0006939 -5.328 1.02e-07 ***
## WhiteMinority 0.1538525 0.0261259 5.889 4.02e-09 ***
## sexMale:age 0.0017881 0.0011196 1.597 0.11
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.8144 on 9430 degrees of freedom
## (23 observations deleted due to missingness)
## Multiple R-squared: 0.02544, Adjusted R-squared: 0.02503
## F-statistic: 61.55 on 4 and 9430 DF, p-value: < 2.2e-16
model2_pred<-ggpredict(model2,c("age","sex"),ci_level = 0.95)
plot(model2_pred)+ylim(1.8,2.5)
When two categorical predictors interact, we can explore how their combined levels influence the response variable. This analysis helps to determine whether the relationship between one categorical variable and the outcome depends on the levels of another categorical variable. For example, we can estimate how discrimination scores vary by the interaction of race and sex.
model3<-lm(discrimination~sex*White+age,data=data)
summary(model3)
##
## Call:
## lm(formula = discrimination ~ sex * White + age, data = data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.4841 -0.4156 0.5189 0.6171 1.1296
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2.4310918 0.0390272 62.292 < 2e-16 ***
## sexMale -0.3254683 0.0503743 -6.461 1.09e-10 ***
## WhiteMinority 0.1065849 0.0326849 3.261 0.00111 **
## age -0.0029771 0.0005488 -5.425 5.95e-08 ***
## sexMale:WhiteMinority 0.1258074 0.0536620 2.344 0.01908 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.8143 on 9430 degrees of freedom
## (23 observations deleted due to missingness)
## Multiple R-squared: 0.02575, Adjusted R-squared: 0.02534
## F-statistic: 62.31 on 4 and 9430 DF, p-value: < 2.2e-16
model3_pred<-ggpredict(model3,c("White","sex"),ci_level = 0.95)
plot(model3_pred)+ylim(1.5,2.5)