dan dan - 2 months ago 26
R Question

Color coding and legend labels in ggplot

I have data which I'd like to plot using

ggplot
's
geom_point
:

set.seed(1)
df <- data.frame(x=rnorm(100),y=rnorm(100),val=c(rnorm(90),rep(NA,10)))


I add colors according to intervals of
df$val
:

intervals.df <- data.frame(interval=c("(-3,-2]","(-2,-0.999]","(-0.999,0]","(0,1.96]","(1.96,3.91]","(3.91,5.87]","not expressed"),
start=c(-3,-2,-0.999,0,1.96,3.91,NA),end=c(-2,-0.999,0,1.96,3.91,5.87,NA),
col=c("#2f3b61","#436CE8","#E0E0FF","#7d4343","#C74747","#EBCCD6","#D3D3D3"),stringsAsFactors=F)
df <- cbind(df,do.call(rbind,lapply(df$val,function(x){
if(is.na(x)){
return(data.frame(col=intervals.df$col[nrow(intervals.df)],interval=intervals.df$interval[nrow(intervals.df)]))
} else{
idx <- which(intervals.df$start <= x & intervals.df$end >= x)
return(data.frame(col=intervals.df$col[idx],interval=intervals.df$interval[idx]))
}
})))


Here I set
df$col
as
factor
and set the labels to be the intervals so I can plot them in the legend:

df$col <- factor(df$col,levels=intervals.df$col,labels=intervals.df$interval)


This will also display all the intervals including those that the
df$val
might not cover, but I want that.

And here's how I try to plot it:

library(ggplot2)
ggplot(df,aes(x=x,y=y,colour=col))+geom_point(cex=2,shape=1,stroke=1)+labs(x="X",y="Y")+theme_bw()+theme(legend.key=element_blank(),panel.border=element_blank(),strip.background=element_blank())+scale_shape(solid=T)+scale_fill_manual(drop=FALSE,values=levels(df$col),name="DE")


Which gets me close but the colors are not right:
enter image description here

So I thought this plot command will correct that (adding
scale_color_manual
):

ggplot(df,aes(x=x,y=y,colour=col))+geom_point(cex=2,shape=1,stroke=1)+labs(x="X",y="Y")+theme_bw()+theme(legend.key=element_blank(),panel.border=element_blank(),strip.background=element_blank())+scale_shape(solid=T)+scale_fill_manual(drop=FALSE,values=levels(df$col),name="DE")+scale_color_manual(drop=FALSE,values=levels(df$col),name="DE")


But that throws the error:

Error in grDevices::col2rgb(colour, TRUE) : invalid color name '(0,1.96]'


So, how to I get the colors right (and the legend
name
right too)?

Answer

One option is map the colors to interval after setting the levels via intervals.df so the order of the levels and the number of levels is correct. Use the colors from intervals.df, making a named vector of the colors to pass to scale_color_manual.

# Set levels of interval via intervals.df
df$interval = factor(df$interval, levels=intervals.df$interval)

# Named vector of the colors based on intervals.df
colors = intervals.df$col
names(colors) = intervals.df$interval

ggplot(df, aes(x=x, y=y, colour=interval))+
    geom_point(cex=2, shape=1, stroke=1) +
    labs(x="X", y="Y")+
    theme_bw()+
    theme(legend.key=element_blank(),
         panel.border=element_blank(), strip.background=element_blank())+
    scale_color_manual(values = colors, name = "DE")