Sam Zipper - 1 year ago 157
R Question

# R: facet_grid plot of differences between groups using ggplot2

I'm trying to create a series of plots showing differences between groups of a measured variable, and am searching for an efficient way to do this using the

`facet_grid`
feature of
`ggplot2`
in R.

Here is an illustrative example:

``````# sample input data
df <- data.frame(year=rep(c(2011:2015), 2),
value=c(0:4, 1:5),
scenario=rep(c("a","b"), each=5))

# make a sample plot
p <-
ggplot(df, aes(x=year, y=value)) +
geom_point() + geom_line() +
facet_grid(scenario ~ scenario)
``````

This produces the following sample plot, in which
`value`
is plotted against
`year`
separately for each scenario combination:

(I assume the second row is not plotted because it is identical to the first).

However, what I am looking for is a plot where, in each facet, (value in scenario on top) - (value in scenario on right) is plotted by year. Specifically:

• Upper left plot would be (value a) - (value a) = 0 for all years.

• Upper right plot would be (value b) - (value a) = 1 for all years.

• Lower left plot would be (value a) - (value b) = -1 for all years.

• Lower right plot would be (value b) - (value b) = 0 for all years

I have not been able to find any built-in or automated difference command to
`facet_grid`
. My initial thought was to pass a function as the
`y`
argument to
`ggplot`
, but given that the data frame has a single
`value`
column I got stumped. I am guessing there might be a solution using some combination of
`dplyr`
and
`reshape2`
but cannot wrap my head around how to implement it.

Here is an option using some functions from `tidyr` to first `spread` the data to allow contrasts to be calculated, then `gather`ing it back together to allow plotting:

``````forPlotting <-
df %>%
mutate(`a - b` = a - b
, `b - a` = b - a
, `a - a` = 0
, `b - b` = 0) %>%
gather(Comparison, Difference, -(year:b) ) %>%
separate(Comparison, c("First Val", "Second Val"), " - ")
``````

That returns a data.frame like so (just the head here):

``````  year a b First Val Second Val Difference
1 2011 0 1         a          b         -1
2 2012 1 2         a          b         -1
3 2013 2 3         a          b         -1
4 2014 3 4         a          b         -1
5 2015 4 5         a          b         -1
6 2011 0 1         b          a          1
``````

And you can plot like so:

``````ggplot(forPlotting
, aes(x = year, y = Difference)) +
geom_point() + geom_line() +
facet_grid(`First Val` ~ `Second Val`)
``````

The bigger question is why you want to do this. I assume that you already know that just plotting the two sets as different color lines is an easier visualization:

``````ggplot(df, aes(x=year, y=value, col = scenario)) +
geom_point() + geom_line()
``````

So, I am assuming that you have more complicated data -- specifically, with lots more columns to compare. So, here is an approach that will automate (and simplify) many of the above steps for multiple columns. The approach is basically the same, but it uses `mutate_` to allow you to pass in a vector with the columns you are trying to create.

``````df <-
data.frame(
year = 2011:2015
, a = 0:4
, b = 1:5
, c = 2:6
, d = 3:7
)

allContrasts <-
outer(colnames(df)[-1]
, colnames(df)[-1]
, paste
, sep = " - ") %>%
as.character() %>%
setNames(., .) %>%
as.list()

forPlotting <-
df %>%
mutate_(.dots = allContrasts) %>%
select(-(a:d)) %>%
gather(Comparison, Difference, -year ) %>%
separate(Comparison, c("First Val", "Second Val"), " - ") %>%
filter(`First Val` != `Second Val`)

ggplot(forPlotting
, aes(x = year, y = Difference)) +
geom_point() + geom_line() +
facet_grid(`First Val` ~ `Second Val`) +
theme(axis.text.x = element_text(angle = 90))
``````

Gives this:

Why can I not leave this alone? I just like playing with the standard evaluation too much. If you have non-parsing column names (e.g, things with spaces) the above will fail. So, here is an example with such column names, showing the addition of backticks to ensure the columns parse correctly.

``````df <-
data.frame(
year = 2011:2015
, value = c(0:4, 1:5, 2:6, 3:7)
, scenario = rep(c("Unit 1", "Exam 2"
, "Homework", "Final Exam")
, each = 5)
) %>%

allContrasts <-
outer(paste0("`", colnames(df)[-1], "`")
, paste0("`", colnames(df)[-1], "`")
, paste
, sep = " - ") %>%
as.character() %>%
setNames(., .) %>%
as.list()

forPlotting <-
df %>%
mutate_(.dots = allContrasts) %>%
select_(.dots = paste0("-`", colnames(df)[-1], "`")) %>%
gather(Comparison, Difference, -year ) %>%
separate(Comparison, c("First Val", "Second Val"), " - ") %>%
filter(`First Val` != `Second Val`) %>%
mutate_each(funs(gsub("`", "", .)), `First Val`, `Second Val`)

ggplot(forPlotting
, aes(x = year, y = Difference)) +
geom_point() + geom_line() +
facet_grid(`First Val` ~ `Second Val`) +
theme(axis.text.x = element_text(angle = 90))
``````

Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download