Daniel Anderson - 3 months ago 25

R Question

My question is similar to this but the answers there will not work for me. Basically, I'm trying to produce a regression discontinuity plot with a "fuzzy" design that uses all the data for the treatment and control groups, but only plots the regression line within the "range" of the treatment and control groups.

Below, I've simulated some data and produced the fuzzy RD plot with base graphics. I'm hoping to replicate this plot with ggplot2. Note that the most important part of this is that the light blue regression line is fit using all the blue points, while the peach colored regression line is fit using all the red points, despite only being plotted over the ranges in which individuals were intended to receive treatment. That's the part I'm having a hard time replicating in ggplot.

I'd like to move to ggplot because I'd like to use faceting to produce this same plot across various units in which participants were nested. In the code below, I show a non-example using

`geom_smooth`

`geom_smooth`

`library(MASS)`

mu <- c(0, 0)

sigma <- matrix(c(1, 0.7, 0.7, 1), ncol = 2)

set.seed(100)

d <- as.data.frame(mvrnorm(1e3, mu, sigma))

# Create treatment variable

d$treat <- ifelse(d$V1 <= 0, 1, 0)

# Introduce fuzziness

d$treat[d$treat == 1][sample(100)] <- 0

d$treat[d$treat == 0][sample(100)] <- 1

# Treatment effect

d$V2[d$treat == 1] <- d$V2[d$treat == 1] + 0.5

# Add grouping factor

d$group <- gl(9, 1e3/9)

`library(RColorBrewer)`

pal <- brewer.pal(5, "RdBu")

color <- d$treat

color[color == 0] <- pal[1]

color[color == 1] <- pal[5]

plot(V2 ~ V1,

data = d,

col = color,

bty = "n")

abline(v = 0, col = "gray", lwd = 3, lty = 2)

# Fit model

m <- lm(V2 ~ V1 + treat, data = d)

# predicted achievement for treatment group

pred_treat <- predict(m,

newdata = data.frame(V1 = seq(-3, 0, 0.1),

treat = 1))

# predicted achievement for control group

pred_no_treat <- predict(m,

newdata = data.frame(V1 = seq(0, 4, 0.1),

treat = 0))

# Add predicted achievement lines

lines(seq(-3, 0, 0.1), pred_treat, col = pal[4], lwd = 3)

lines(seq(0, 4, 0.1), pred_no_treat, col = pal[2], lwd = 3)

# Add legend

legend("bottomright",

legend = c("Treatment", "Control"),

lty = 1,

lwd = 2,

col = c(pal[4], pal[2]),

box.lwd = 0)

`d$treat <- factor(d$treat, labels = c("Control", "Treatment"))`

library(ggplot2)

ggplot(d, aes(V1, V2, group = treat)) +

geom_point(aes(color = treat)) +

geom_smooth(method = "lm", aes(color = treat)) +

facet_wrap(~group)

Notice the regression lines extending past the treatment range for groups 1 and 2.

Answer

There's probably a more graceful way to make the lines with `geom_smooth`

, but it can be hacked together with `geom_segment`

. Munge the data.frames outside of the plotting call if you like.

```
ggplot(d, aes(x = V1, y = V2, color = factor(treat, labels = c('Control', 'Treatment')))) +
geom_point(shape = 21) +
scale_color_brewer(NULL, type = 'qual', palette = 6) +
geom_vline(aes(xintercept = 0), color = 'grey', size = 1, linetype = 'dashed') +
geom_segment(data = data.frame(t(predict(m, data.frame(V1 = c(-3, 0), treat = 1)))),
aes(x = -3, xend = 0, y = X1, yend = X2), color = pal[4], size = 1) +
geom_segment(data = data.frame(t(predict(m, data.frame(V1 = c(0, 4), treat = 0)))),
aes(x = 0, xend = 4, y = X1, yend = X2), color = pal[2], size = 1)
```

Another option is `geom_path`

:

```
df <- data.frame(V1 = c(-3, 0, 0, 4), treat = c(1, 1, 0, 0))
df <- cbind(df, V2 = predict(m, df))
ggplot(d, aes(x = V1, y = V2, color = factor(treat, labels = c('Control', 'Treatment')))) +
geom_point(shape = 21) +
geom_vline(aes(xintercept = 0), color = 'grey', size = 1, linetype = 'dashed') +
scale_color_brewer(NULL, type = 'qual', palette = 6) +
geom_path(data = df, size = 1)
```