Endre Endre - 1 month ago 30
R Question

Interaction marginal effects plot with overlay histogram using ggplot2

I would like to create an interaction marginal effects plot where the histogram of the predictor is in the background of the plot. The issue is somewhat complicated since the marginal effects plot is also faceted. I would like the end result to look like something the MARHIS package does in Stata. In case of continuous predictors, I just use

geom_rug
, but that does not work with factors. I would like to use
geom_histogram
but I run into a scaling problem:

ggplot(newdat, aes(cenretrosoc, linetype = factor(wave))) +
geom_line(aes(y = cengovrec), size=0.8) +
scale_linetype_manual(values = c("dotted", "twodash", "solid")) +
geom_line(aes(y = plo,
group=factor(wave)), linetype =3) +
geom_line(aes(y = phi,
group=factor(wave)), linetype =3) +
facet_grid(. ~ regioname) +
xlab("Economy") +
ylab("Disapproval of Record") +
labs(linetype='Wave') +
theme_minimal()


which works, and produces this graph: 1

However, when I add the histogram bit

+ geom_histogram(aes(cenretrosoc), position="identity", linetype=1,
fill="gray60", data = data, alpha=0.5)


this is what happens:2

I think it's because of the different scales the predicted probabilities and the histogram has on the Y axis. But I am not sure how to solve this. Any ideas?

UPDATE:

Here's a reproducible example to illustrate the issue (it needs the WWGbook package for the data it uses)

# install.packages("WWGbook")
# install.packages("lme4")
# install.packages("ggplot2")
require("WWGbook")
require("lme4")
require("ggplot2")
# checking the dataset
head(classroom)

# specifying the model
model <- lmer(mathgain ~ yearstea*sex*minority
+ (1|schoolid/classid), data=classroom)

# dataset for prediction
newdat <- expand.grid(
mathgain = 0,
yearstea = seq(min(classroom$yearstea, rm=TRUE),
max(classroom$yearstea, rm=TRUE),
5),
minority = seq(0, 1, 1),
sex = seq(0,1,1))

mm <- model.matrix(terms(model), newdat)

## calculating the predictions

newdat$mathgain <- predict(model,
newdat, re.form = NA)
pvar1 <- diag(mm %*% tcrossprod(vcov(model), mm))

## Calculating lower and upper CI
cmult <- 1.96
newdat <- data.frame(
newdat, plo = newdat$mathgain - cmult*sqrt(pvar1),
phi = newdat$mathgain + cmult*sqrt(pvar1))

## this is the plot of fixed effects uncertainty

marginaleffect <- ggplot(newdat, aes(yearstea, linetype = factor(sex))) +
geom_line(aes(y = mathgain), size=0.8) +
scale_linetype_manual(values = c("dotted", "twodash")) +
geom_line(aes(y = plo,
group=factor(sex)), linetype =3) +
geom_line(aes(y = phi,
group=factor(sex)), linetype =3) +
facet_grid(. ~ minority) +
xlab("First grade teacher years of teaching experience") +
ylab("Predicted Probability of student gain in math") +
labs(linetype='Sex') +
theme_minimal()


As one can see
marginaleffect
is a plot of marginal effects :) Now I want to add the histogram to the background, so I write:

marginaleffect + geom_histogram(aes(yearstea), position="identity", linetype=1,
fill="gray60", data = classroom, alpha=0.5)


It does add the histogram, but it overwrites the OY scale with the histogram values. In this example one can still see the effects as the original predicted probabilities scale was comparable to the frequencies. However, in my case, a data-set with many values, this is not the case.

Preferably I would not have any scale for the histogram shown. It should just have a maximum which is the predicted probabilities scale maximum, so it covers the same area, but it does not overwrite the pred prob values on the vertical axis.

Answer

Check this post out: https://rpubs.com/kohske/dual_axis_in_ggplot2

I followed the steps and was able to get the following plot:

enter image description here

Also, note that I've added scale_y_continuous(expand = c(0,0)) to make histograms start from the x-axis.

m1 <- marginaleffect + geom_line(aes(y = mathgain), size=0.8) +
    scale_linetype_manual(values = c("dotted", "twodash")) +
    geom_line(aes(y = plo,
                  group=factor(sex)), linetype =3) +
    geom_line(aes(y = phi,
                  group=factor(sex)), linetype =3)

m2 <- marginaleffect + geom_histogram(aes(yearstea), position="identity", linetype=1,
                                fill="gray60", data = classroom, alpha=0.5, bins = 30) +
        scale_y_continuous(expand = c(0,0))


g1 <- ggplot_gtable(ggplot_build(m1))
g2 <- ggplot_gtable(ggplot_build(m2))

library(gtable)

pp <- c(subset(g1$layout, name == "panel", se = t:r))
# The link uses g2$grobs[[...]] but it doesn't seem to work... single bracket works, on the other hand....
g <- gtable_add_grob(g1, g2$grobs[which(g2$layout$name == "panel")], pp$t, pp$l, pp$b, pp$l)

library(grid)
grid.draw(g)