Dominique Dominique - 2 months ago 19
R Question

How do I add multiple traces/trendlines each with a subset of data onto a single scatterplot in R using plotly

In the

plotly
package, I am using the
ggplot()
function and
ggplotly()
function. I want to create a scatterplot with the amount of CWD on the x-axis and the amount of Lawn on the y-axis. There should be three lines per scatterplot reflecting a linear relationship for each year (2013, 2014, 2015). Each year has 10 points for Lawn and 10 points for CWD. As sample of my data is below:

Year CWD Lawn
1 2013 0 420
2 2013 6 390
3 2013 14 410
4 2013 12 349
5 2013 3 348
6 2013 46 354
7 2013 121 311
8 2013 56 381
9 2013 42 386
10 2013 26 381
11 2014 2 121
12 2014 2 163
13 2014 3 298


And here is the code I'm using:

library(plotly)

### Amount of Lawn versus Amount of CWD
fit<-lm(Lawn~CWD,data=data)
lawn <- ggplot(data, aes(x=CWD, y=Lawn, colour=Year)) + geom_point()
ggplotly()
add_trace(data=data, x = CWD, y = fitted(fit), mode = "lines")


I know that the above code is incorrect because it only fits one line onto the graph without considering year. I have tried to use
geom_abline
but I dont know how to extract a subset of data within this function. See below:

enter image description here

So firstly, how do I plot three traces (one for each year)? Should I be importing my data into R in separate .csv files per year? Surely, there is an easier way to do this within the code.
Secondly, how do I change the colours of the dots and line?

Answer

The easiest way to do this would be within ggplot itself, using geom_smooth to do the regression for you:

lawn <- ggplot(dat, aes(x=CWD, y=Lawn, colour=factor(Year))) + 
  geom_point() +
  geom_smooth(method = 'lm', se = FALSE)

Note that I've name the data dat, since data is a function in R.

With you sample data:

enter image description here

In regards to color, have a look at ?scale_colour_discrete.

Comments