Sam Zipper Sam Zipper - 22 days ago 9
R Question

R: facet_grid plot of differences between groups using ggplot2

I'm trying to create a series of plots showing differences between groups of a measured variable, and am searching for an efficient way to do this using the

facet_grid
feature of
ggplot2
in R.

Here is an illustrative example:

# sample input data
df <- data.frame(year=rep(c(2011:2015), 2),
value=c(0:4, 1:5),
scenario=rep(c("a","b"), each=5))

# make a sample plot
p <-
ggplot(df, aes(x=year, y=value)) +
geom_point() + geom_line() +
facet_grid(scenario ~ scenario)


This produces the following sample plot, in which
value
is plotted against
year
separately for each scenario combination:

sample facet plot

(I assume the second row is not plotted because it is identical to the first).

However, what I am looking for is a plot where, in each facet, (value in scenario on top) - (value in scenario on right) is plotted by year. Specifically:


  • Upper left plot would be (value a) - (value a) = 0 for all years.

  • Upper right plot would be (value b) - (value a) = 1 for all years.

  • Lower left plot would be (value a) - (value b) = -1 for all years.

  • Lower right plot would be (value b) - (value b) = 0 for all years



I have not been able to find any built-in or automated difference command to
facet_grid
. My initial thought was to pass a function as the
y
argument to
ggplot
, but given that the data frame has a single
value
column I got stumped. I am guessing there might be a solution using some combination of
dplyr
and
reshape2
but cannot wrap my head around how to implement it.

Answer

Here is an option using some functions from tidyr to first spread the data to allow contrasts to be calculated, then gathering it back together to allow plotting:

forPlotting <-
  df %>%
  spread(scenario, value) %>%
  mutate(`a - b` = a - b
         , `b - a` = b - a
         , `a - a` = 0
         , `b - b` = 0) %>%
  gather(Comparison, Difference, -(year:b) ) %>%
  separate(Comparison, c("First Val", "Second Val"), " - ")

That returns a data.frame like so (just the head here):

  year a b First Val Second Val Difference
1 2011 0 1         a          b         -1
2 2012 1 2         a          b         -1
3 2013 2 3         a          b         -1
4 2014 3 4         a          b         -1
5 2015 4 5         a          b         -1
6 2011 0 1         b          a          1

And you can plot like so:

ggplot(forPlotting
       , aes(x = year, y = Difference)) +
  geom_point() + geom_line() +
  facet_grid(`First Val` ~ `Second Val`)

enter image description here

The bigger question is why you want to do this. I assume that you already know that just plotting the two sets as different color lines is an easier visualization:

ggplot(df, aes(x=year, y=value, col = scenario)) +
  geom_point() + geom_line()

enter image description here

So, I am assuming that you have more complicated data -- specifically, with lots more columns to compare. So, here is an approach that will automate (and simplify) many of the above steps for multiple columns. The approach is basically the same, but it uses mutate_ to allow you to pass in a vector with the columns you are trying to create.

df <-
  data.frame(
    year = 2011:2015
    , a = 0:4
    , b = 1:5
    , c = 2:6
    , d = 3:7
  )

allContrasts <-
  outer(colnames(df)[-1]
        , colnames(df)[-1]
        , paste
        , sep = " - ") %>%
  as.character() %>%
  setNames(., .) %>%
  as.list()

forPlotting <-
  df %>%
  mutate_(.dots = allContrasts) %>%
  select(-(a:d)) %>%
  gather(Comparison, Difference, -year ) %>%
  separate(Comparison, c("First Val", "Second Val"), " - ") %>%
  filter(`First Val` != `Second Val`)

ggplot(forPlotting
       , aes(x = year, y = Difference)) +
  geom_point() + geom_line() +
  facet_grid(`First Val` ~ `Second Val`) +
  theme(axis.text.x = element_text(angle = 90))

Gives this:

enter image description here

Why can I not leave this alone? I just like playing with the standard evaluation too much. If you have non-parsing column names (e.g, things with spaces) the above will fail. So, here is an example with such column names, showing the addition of backticks to ensure the columns parse correctly.

df <-
  data.frame(
    year = 2011:2015
    , value = c(0:4, 1:5, 2:6, 3:7)
    , scenario = rep(c("Unit 1", "Exam 2"
                       , "Homework", "Final Exam")
                     , each = 5)
  ) %>%
  spread(scenario, value)

allContrasts <-
  outer(paste0("`", colnames(df)[-1], "`")
        , paste0("`", colnames(df)[-1], "`")
        , paste
        , sep = " - ") %>%
  as.character() %>%
  setNames(., .) %>%
  as.list()

forPlotting <-
  df %>%
  mutate_(.dots = allContrasts) %>%
  select_(.dots = paste0("-`", colnames(df)[-1], "`")) %>%
  gather(Comparison, Difference, -year ) %>%
  separate(Comparison, c("First Val", "Second Val"), " - ") %>%
  filter(`First Val` != `Second Val`) %>%
  mutate_each(funs(gsub("`", "", .)), `First Val`, `Second Val`)

ggplot(forPlotting
       , aes(x = year, y = Difference)) +
  geom_point() + geom_line() +
  facet_grid(`First Val` ~ `Second Val`) +
  theme(axis.text.x = element_text(angle = 90))

enter image description here

Comments