El Niño El Niño - 1 year ago 49
R Question

ggplot legend order mismatch

I'm a newbie to R/ggplot, and I have surveyed StackOverflow for similar problems, to no avail. It is probably a trivial problem for the ggplot experts in the house, if so, I look forward to a quick answer!

So here goes: I am trying to plot 4 curves from variables stored in a dataframe, playng with both color and linetype. Unfortunately the legend labels are mismatched to the curves, which defeats the whole purpose of a legend.

Here is the figure:
4 curves with mismatched labels

And here is the code that generated it:

# declare variables
alpha = 0.5
m = 500
x = seq(m)
fdr_thresh = x/m*alpha
lvl_thresh = rep(alpha,m)
pvals = sin(2*pi*x/20)
pvalsA = exp(-x/100)*cos(2*pi*x/20)

# plot
df <- data.frame(pvals=pvals, pvalsA = pvalsA, FDR = fdr_thresh, level = lvl_thresh, x=x)
p4 <- ggplot(data = df) + geom_line(aes(x=x, y=pvals,color="Pure Sinusoid",linetype="Pure Sinusoid"))
p4 <- p4 + geom_line(aes(x=x, y=pvalsA,color="Damped Sinusoid",linetype="Damped Sinusoid"))
p4 <- p4 + geom_line(aes(x=x, y=FDR,color = 'FDR', linetype='FDR'))
p4 <- p4 + geom_line(aes(x=x, y=level,color='alpha',linetype='alpha'))
p4 = p4+ scale_linetype_manual(name = "Significance", values=c("Pure Sinusoid"= 1,"Damped Sinusoid" = 1,"FDR" = 2,"alpha" = 3),labels=c("Pure Sinusoid", "Damped Sinusoid", "Sloping line",bquote(alpha == .(alpha))))
p4 = p4 + scale_color_manual(name = "Significance",values=c("Pure Sinusoid"= "Chocolate1","Damped Sinusoid" = 'Chartreuse4',"FDR" = "black","alpha" = "black") , labels=c("Pure Sinusoid", "Damped Sinusoid", "Sloping line",bquote(alpha == .(alpha))))
p4 <- p4 + theme(legend.position = c(0.7, 0.8),legend.title=element_blank(),legend.key = element_rect(fill = "transparent",colour = "transparent"),legend.background = element_rect(fill=alpha('white', 0.8)))

I would be most grateful for a tip on how to do this better. The same plot in Matlab or Python would be correct and have taken 15 minutes, so I must have made a wrong decision somewhere. (and no, choosing R was not that decision: this is part of my effort to learn it)

Answer Source

The "right way" to do this is to melt your data set into one long data frame and then let ggplot take care of plotting the multiple lines for you, rather than issuing separate geom_line() calls ...

# plot
df <- data.frame(pvals, pvalsA, FDR = fdr_thresh, level = lvl_thresh, x)

mm <- reshape2::melt(df,id.var="x")
my.labs <- c("Pure Sinusoid","Damped Sinusoid","Sloping line",
p4 <- ggplot(data = mm, aes(x,value,colour=variable,linetype=variable)) +
p4 <- p4 + scale_linetype_manual(name="Significance",values=c(1,1,2,3),
p4 <- p4 + scale_color_manual(name = "Significance",
p4 <- p4 +  theme(legend.position = c(0.7, 0.8),
                  legend.key = element_rect(fill = "transparent",
                                            colour = "transparent"),
          legend.background = element_rect(fill=alpha('white', 0.8)))

enter image description here

(I already had theme_bw() set in the R session I was working in, which is why the gray background/grid lines are missing ...)