giacomoV - 4 months ago 37

R Question

I am having a hard time understanding why the confidence intervals are not showing with my data. When I reproduce my code on another dataset, the code seems to work fine. For example, on

`mtcars`

The code is

`mtols = mtcars %>% group_by(am) %>% do(lm0 = lm(disp ~ mpg*gear + vs, data=.)) %>%`

augment(., lm0) %>%

mutate(ymin=.fitted-1.96*.se.fit, ymax=.fitted+1.96*.se.fit)

To generate the plot

`mtols %>% ggplot(aes(mpg, .fitted) ) +`

geom_smooth(data = mtols, aes(mpg, .fitted, group = gear, colour = gear, fill= gear), method="lm") +

theme_minimal() + facet_grid(~am)

I get the confidence intervals.

However this doesn't work with my data. Could someone help me figure out what goes wrong here ? I would be very grateful.

I compute the

`OLS`

`dt = new %>% group_by(day) %>% do(lm0 = lm(y ~ year*class, data=.)) %>% augment(., lm0) %>%`

mutate(ymin=.fitted-1.96*.se.fit, ymax=.fitted+1.96*.se.fit)

dt$year = as.numeric(as.character(dt$year))

The plot, (this is an example with few cases, but the results is the same with the whole dataset)

`dt %>% ggplot(aes(year, .fitted) ) +`

geom_smooth(data = dt, aes(year, .fitted, group = class, colour = class, fill= class), method="lm") +

theme_bw() + facet_grid(~day)

The

`CI`

Any clue what I am doing wrong here ?

Strangely, when I don't use the

`facet_grid`

`CI`

`dt %>% ggplot(aes(year, .fitted) ) +`

geom_smooth(data = dt, aes(year, .fitted, group = class, colour = class, fill= class), method="lm") +

theme_bw()

A sample of my data

`library(broom)`

library(dplyr)

library(ggplot2)

new = structure(list(id = structure(c(844084L, 114510L, 14070410L,

942483L, 13190105L, 421369L, 301384L, 251789L, 11011210L, 11280408L,

278575L, 310410L, 16260105L, 11110815L, 18260101L, 14260501L,

10580L, 15090210L, 19140410L, 13230615L, 246511L, 20040812L,

14260114L, 287623L, 16090620L, 20131007L, 835743L, 453390L, 395808L,

363617L), label = "Household identifier", class = c("labelled",

"integer")), year = structure(c(1L, 1L, 2L, 1L, 2L, 1L, 1L, 1L,

2L, 2L, 1L, 1L, 2L, 2L, 2L, 2L, 1L, 2L, 2L, 2L, 1L, 2L, 2L, 1L,

2L, 2L, 1L, 1L, 1L, 1L), .Label = c("2000", "2015"), class = "factor"),

day = c("Weekend", "Weekend", "Weekend", "Weekdays", "Weekdays",

"Weekend", "Weekdays", "Weekend", "Weekend", "Weekdays",

"Weekend", "Weekdays", "Weekdays", "Weekend", "Weekend",

"Weekdays", "Weekdays", "Weekend", "Weekdays", "Weekdays",

"Weekdays", "Weekend", "Weekend", "Weekend", "Weekend", "Weekend",

"Weekend", "Weekdays", "Weekdays", "Weekdays"), class = structure(c(1L,

1L, 2L, 2L, 1L, 2L, 2L, 4L, 2L, 2L, 3L, 2L, 1L, 4L, 1L, 3L,

2L, 3L, 2L, 4L, 2L, 1L, 3L, 2L, 1L, 4L, 3L, 2L, 4L, 1L), .Label = c("Higher Managerial",

"Lower Managerial", "Intermediate", "Manual and Routine"), class = "factor"),

y = c(270, 730, 180, 0, 0, 290, 90, 650, 510, 0, 10, 200,

200, 180, 0, 0, 140, 260, 110, 740, 260, 0, 390, 610, 0,

0, 500, 0, 10, 170)), class = "data.frame", row.names = c(NA,

-30L), .Names = c("id", "year", "day", "class", "y"))

Answer

The confidence intervals are being drawn. We can't see them because there are only two unique points for each `day`

.

```
dt2 <- dt %>% filter(class == "Higher Managerial")
plot(.fitted ~ year, data=subset(dt2, day=="Weekend"))
```

The reason we see intervals without the facet is because there is a wider interval when there are four points.

When we do not break out by facet, there are enough points to have some range in the confidence. But the confidence interval of two points has no range.

```
confint(lm(.fitted ~ year, data=subset(dt2, day=="Weekdays")))
# 2.5 % 97.5 %
# (Intercept) 9503.333333 9503.333333
# year -4.666667 -4.666667
```

**Edit**

Here is a version where we use the `ymin`

and `ymax`

that were originally calculated, and plot it with `geom_ribbon`

.

```
dt %>% ggplot(aes(year, .fitted,group = class, colour = class, fill= class)) +
geom_line() +
geom_ribbon(aes(ymin=ymin, ymax=ymax), alpha=0.2) +
theme_bw() + facet_grid(~day)
```