Gui_struggling_with_R Gui_struggling_with_R - 9 months ago 50
R Question

Visualizing the difference between two points with ggplot2

I want to visualize the difference between two points with a line/bar in ggplot2.

Suppose we have some data on income and spending as a time series.
We would like to visualize not only them, but the balance (=income - spending) as well.
Furthermore, we would like to indicate whether the balance was positive (=surplus) or negative (=deficit).

I have tried several approaches, but none of them produced a satisfying result. Here we go with a reproducible example.

# Load libraries and create LONG data example data.frame

df <- data.frame(year = rep(2000:2009, times=3),
var = rep(c("income","spending","balance"), each=10),
value = c(0:9, 9:0, rep(c("deficit","surplus"), each=5)))

1.Approach with LONG data

Unsurprisingly, it doesn't work with LONG data,
because the
cannot be specified correctly.
ymin=value, ymax=value
is definately the wrong way to go (expected behaviour).
ymin=income, ymax=spending
is obviously wrong, too (expected behaviour).

df %>%
ggplot() +
geom_point(aes(x=year, y=value, colour=var)) +
geom_linerange(aes(x=year, ymin=value, ymax=value, colour=net))

#>Error in function_list[[i]](value) : could not find function "spread"

2.Approach with WIDE data

I almost got it working with WIDE data.
The plot looks good, but the legend for the
is missing (expected behaviour).
Simply adding
show.legend = TRUE
to the two geom_point(s) doesn't solve the problem as it overprints the
legend. Besides, I would rather have the
lines of code combined in one (see 1.Approach).

df %>%
spread(var, value) %>%
ggplot() +
geom_linerange(aes(x=year, ymin=spending, ymax=income, colour=balance)) +
geom_point(aes(x=year, y=spending), colour="red", size=3) +
geom_point(aes(x=year, y=income), colour="green", size=3) +
ggtitle("income (green) - spending (red) = balance")


3.Approach using LONG and WIDE data

Combining the 1.Approach with the 2.Approach results in yet another unsatisfying plot. The legend does not differentiate between balance and var (=expected behaviour).

ggplot() +
geom_point(data=(df %>% filter(var=="income" | var=="spending")),
aes(x=year, y=value, colour=var)) +
geom_linerange(data=(df %>% spread(var, value)),
aes(x=year, ymin=spending, ymax=income, colour=balance))


  • Any (elegant) way out of this dilemma?

  • Should I use some other
    instead of

  • Is my data in the right format?

Answer Source


ggplot(df[df$var != "balance", ]) + 
    aes(x = year, y = value, fill = var), 
        size=3, pch = 21, colour = alpha("white", 0)) +
    aes(x = year, ymin = income, ymax = spending, colour = balance), 
        data = spread(df, var, value)) +
  scale_fill_manual(values = c("green", "red"))

Output: enter image description here

The main idea is that we use two different types of aesthetics for colours (fill for the points, with the appropriate pch, and colour for the lines) so that we get separate legends for each.