Alex Coppock Alex Coppock - 4 months ago 37
R Question

tidyr::pop_quiz: is there a faster/ more transparent way to reshape the anscombe dataset?

I'm trying to get good with

. Is there a better way to prep the
dataset for plotting with
? Specifically, I don't love having to add data (
). How would you do this?


anscombe %>%
mutate(obs_num = 1:n()) %>%
gather(variable, value, -obs_num) %>%
separate(variable, c("variable", "set"), 1) %>%
spread(variable, value) %>%
ggplot(aes(x = x, y = y)) +
geom_point() +
stat_smooth(method = "lm", se = FALSE, fullrange = TRUE) +


I think you need to add the extra column in order to uniquely identify each observation in the call to spread. Hadley discusses this in a comment on this SO question. Another approach would be to separately stack the x and y columns, as in the code below, but I don't see why that would be any better than your version. In fact, it could be worse if there are cases where the x and y values end up out of correspondence:

bind_cols(anscombe %>% select(matches("x")) %>% gather(set, "x"),
          anscombe %>% select(matches("y")) %>% gather(key, "y")) %>%
  select(-key) %>%
  mutate(set = gsub("x", "Set: ", set))

Another option would be to use base reshape, which is more succinct:

anscombe %>% 
  reshape(varying=1:8, direction="long", sep="", timevar="set")