CBrauer CBrauer - 9 months ago 70
R Question

Why are the colors wrong on this ggplot?

I am new to ggplot2 so please have mercy on me.

My first attempt produces a strange result (at least it's strange to me). My reproducible R code is:

iterations = 7
variables = 14
data <- matrix(ncol=variables, nrow=iterations)

data[1,] = c(0,0,0,0,0,0,0,0,10134,10234,10234,10634,12395,12395)
data[2,] = c(18596,18596,18596,18596,19265,19265,19390,19962,19962,19962,19962,20856,20856,21756)
data[3,] = c(7912,11502,12141,12531,12718,12968,13386,17998,19996,20226,20388,20583,20879,21367)
data[4,] = c(0,0,0,0,0,0,0,43300,43500,44700,45100,45100,45200,45200)
data[5,] = c(11909,11909,12802,12802,12802,13202,13307,13808,21508,21508,21508,22008,22008,22608)
data[6,] = c(11622,11622,11622,13802,14002,15203,15437,15437,15437,15437,15554,15554,15755,16955)
data[7,] = c(8626,8626,8626,9158,9158,9158,9458,9458,9458,9458,9458,9458,9558,11438)

df <- data.frame(data)
n_data_rows = nrow(df)

previous_volumes = df[1:(n_data_rows-1),]/1000
todays_volume = df[n_data_rows,]/1000

time = seq(ncol(df))/6
min_y = min(previous_volumes, todays_volume)
max_y = max(previous_volumes, todays_volume)
ylimit = c(min_y, max_y)
x = seq(nrow(previous_volumes))

# This gives a plot with 6 gray lines and one red line, but no Ledgend

p = ggplot()

for (row in x) {
y1 = as.integer(previous_volumes[row,])
dd = data.frame(time, y1)
p = p + geom_line(data=dd, aes(x=time, y=y1, group="1"), color="gray")

This code produces a correct plot... but no legend. The plot looks like:
enter image description here

If I move "color" inside "aes", I now get a legend... but the colors are wrong.
For example, the code:

p = ggplot()

for (row in x) {
y1 = as.integer(previous_volumes[row,])
dd = data.frame(time, y1)
p = p + geom_line(data=dd, aes(x=time, y=y1, group="1", color="gray"))

y2 = as.integer(todays_volume[1,])
dd = data.frame(time, y2)
p = p + geom_line(data=dd, aes(x=time, y=y2, group="2", colour="red"))


enter image description here

Why are the line colors wrong?


Answer Source

Babtiste is right, you should take the time to read the documentation which many people have spent thousands of hours developing and making clear as possible. The fact that you have added layers iteratively (there are only very rare circumstances when this is necessary with ggplot2), indicates that your present understanding of the most fundamental concepts of ggplot2 sit at about a 0 out of .Machine$double.xmax.

No matter, here is a solution:

# Put data as points 1 per row, series as columns
df.new           = as.data.frame(t(previous_volumes))

#Rename the series, for colour mapping
colnames(df.new) = LETTERS[1:ncol(df.new)]

#Add the times for each point.
df.new$Times     = seq(0,1,length.out = nrow(df.new))

#Put in long format, to enable mapping of the 'variable' to colour.
df.new.melt      = reshape2::melt(df.new,'Times')

#Now use the long-format data for constructing base plot
#Map 'variable' to colour --> plot all series at once
base = ggplot(data=df.new.melt,aes(x=Times,y=value,color=variable)) + 
  geom_path() +
  labs(title = "Volumes Time Series",
       x = "Times",y = "Volumes",color="Series")

Example Nolabel

Given the base, you can also add additional geometries, say marking the last point in each series.

#Additionally, add label of the last point in each series.
df.new.melt.labs = plyr::ddply(df.new.melt,'variable',function(df){ 
  df       = tail(df,1) #Last Point
  df$label = sprintf("%.2f",df$value)
base2 = base +   geom_label(data = df.new.melt.labs,aes(label=label),
                            position = position_nudge(y=1.5),size=3,show.legend = FALSE)

Example Label