watchtower watchtower - 4 months ago 31
R Question

Mapping vs. Setting color in Discrete vs. Continuous case

I am new to ggplot. I am trying to understand how to use ggplot. I am reading Wickham's book and still trying to wrap my head around mapping vs. setting color.

A) Discrete case
Here's what I did:

grid <- data_frame(displ = seq(min(mpg$displ), max(mpg$displ), length = 50))
mod <- loess(hwy ~ displ, data = mpg)
grid$hwy <- predict(mod, newdata = grid)

a) Use discrete values and then use (aes(color = "xyz"))

ggplot(mpg,aes(displ,hwy)) +
geom_point() +
geom_text(data = test,aes(label=trans,color = "blue"))

This just adds a legend with the label "blue". Why does this happen?

b) Supply color = "blue" outside of aesthetics.

ggplot(mpg,aes(displ,hwy)) +
geom_point() +
geom_text(data = test,aes(label=trans),color = "blue")

This works and changes the color to "blue".

B) Continuous case

a) Use (aes(color = "xyz"))
Here's what I did:

ggplot(mpg,aes(displ,hwy)) +
geom_point() +
geom_line(data = grid, aes(colour = "green"),size=1.5)

As with the case a) for discrete case, this adds a pink line with the text "green"

b) Supply color outside of aesthetics.

ggplot(mpg,aes(displ,hwy)) +
geom_point() +
geom_line(data = grid, colour = "green",size=1.5)

Here, the color of the line does change to "Green" and I have lost the label.

So, I am not understanding the value of aes(colour = "xyz"). All it does is that add a label. Isn't it? Why would we use it?


Data - data columns or transformations of data columns, go inside aes(). When you do aes(color = 'blue'), it's as if your data had an unnamed column that had the character string "blue" in every row.

ggplot(mpg,aes(displ,hwy)) +
  geom_point() +
  geom_text(data = test, aes(label = trans, color = "blue"))

In this context, "blue" is not a color - it is just a character string. You will get an identical result (except for the label) if you use color = "green",, color = "bleu", or color = "look at this long long label" - if these are inside aes().

A character string - even if it only has one value - will be coerced to a factor and treated as a discrete variable.

This can be confusing if you don't follow the general rule: don't put constants inside aes() - only put mappings to actual data columns.

You seem to be confused about continuous vs discrete color scales. What you label as "continuous case" is still discrete when it comes to color. Using geom_point or geom_line, a smoothed geom, or any other geom doesn't make color discrete or continuous. The only thing that matters for choosing a discrete or continuous color scale is the type (class) of data that is mapped to color. If it is numeric, the default color scale will be continuous. If it is not numeric, the default color scale will be discrete.