Nonancourt Nonancourt - 22 days ago 5
R Question

How to format the scatterplots of data series in R

I have been struggling in creating a decent looking scatterplot in R. I wouldn't think it was so difficult.
After some research, it seemed to me that

ggplot
would have been a choice allowing plenty of formatting. However, I'm struggling in understanding how it works.
I'd like to create a scatterplot of two data series, displaying the points with two different colours, and perhaps different shapes, and a legend with series names.
Here is my attempt, based on this:

year1 <- mpg[which(mpg$year==1999),]
year2 <- mpg[which(mpg$year==2008),]

ggplot() +
geom_point(data = year1, aes(x=cty,y=hwy,color="yellow")) +
geom_point(data = year2, aes(x=cty,y=hwy,color="green")) +
xlab('cty') +
ylab('hwy')


Now, this looks almost OK, but with non-matching colors (unless I suddenly became color-blind). Why is that?
Also, how can I add series names and change symbol shapes?

enter image description here

Answer

Don't build 2 different dataframes:

df <- mpg[which(mpg$year%in%c(1999,2008)),]
df$year<-as.factor(df$year)
ggplot() + 
  geom_point(data = df, aes(x=cty,y=hwy,color=year,shape=year))  +
  xlab('cty') +
  ylab('hwy')+
  scale_color_manual(values=c("green","yellow"))+
  scale_shape_manual(values=c(2,8))+
  guides(colour = guide_legend("Year"),
         shape = guide_legend("Year"))