X.Chi X.Chi - 3 months ago 38
R Question

ggplot2: Adding more than one Regression Line Equations and R2 on one graph

I have a dataset like,

temp diameter carbon

13°C 2429.45 231.2055
13°C 1701.25 112.4063
20°C 2117.25 223.1670
20°C 2028.65 151.5894
27°C 1780.09 129.2269
27°C 1334.35 136.9062
...


it's about an animal culture experiment, now I want to plot regression between diameter and carbon content. However, what I want to do is to separate the temperature level, to make a plot like this:
regression plot

now I want to add regression equation and R^2 value, and I followed the code of
regression equation code,

what I got was only the regression between carbon content and diameter, there were no separate results of different temperature(I want three regression line with three equations and R^2 value).

here is the code I used:

p<-ggplot(diameter_biomass2, aes(x=diameter, y=carbon,colour=temperature))+
geom_point(alpha=.5)+

labs(title="Relationship between diameter and biomass \n",
x="Diameter(μm)",
y="Carbon content(μg)")+
scale_x_continuous(expand = c(0, 0)) +
scale_y_continuous(limits = c(0,300), expand = c(0, 0)) +

geom_smooth(method = "lm",se=F)+

theme(panel.grid.major=element_blank(),
panel.grid.minor=element_blank(),
panel.background=element_rect(fill = "white"),
panel.border=element_rect(colour="black",fill=NA,size=.5))
p

#add regression equation and R^2
lm_eqn <- function(diameter_biomass2){
m <- lm(carbon ~ diameter+temperature, diameter_biomass2);
eq <- substitute(italic(y) == a + b %.% italic(x)*","~~italic(r)^2~"="~r2,
list(a = format(coef(m)[1], digits = 2),
b = format(coef(m)[2], digits = 2),
r2 = format(summary(m)$r.squared, digits = 3)))
as.character(as.expression(eq));
}

p1 <- p + geom_text(x = 1000, y = 200, label = lm_eqn(diameter_biomass2), parse = TRUE)

p1


Any comments, suggestions will be high appriciate! Thank you so much!

Answer
library(purrr)
library(dplyr)

Using the example data your posted in your question

diameter_biomass2 <- read.table("~/Binfo/TST/Stack/test.txt", header = T)

taking a liberty of cohercing temp to factor because it will be our grouping variable

diameter_biomass2$temp %<>% as.factor()

p <- ggplot(diameter_biomass2, aes(x=diameter, y=carbon,colour=temp))+
geom_point(alpha=.5)+
labs(title="Relationship between diameter and biomass \n",
     x="Diameter(μm)",
     y="Carbon content(μg)")+
scale_x_continuous(expand = c(0, 0)) +
scale_y_continuous(limits = c(0,300), expand = c(0, 0)) +
geom_smooth(method = "lm",se=F)+
theme(panel.grid.major=element_blank(),
      panel.grid.minor=element_blank(),
      panel.background=element_rect(fill = "white"),
      panel.border=element_rect(colour="black",fill=NA,size=.5))
p

Modify your existing function for extracting model coefficients

lm_eqn <- function(m){
eq <- substitute(italic(y) == a + b %.% italic(x)*","~~italic(r)^2~"="~r2, 
                 list(a = format(coef(m)[1], digits = 2), 
                      b = format(coef(m)[2], digits = 2), 
                      r2 = format(summary(m)$r.squared, digits = 3)))
as.character(as.expression(eq));                 
}

Using library(purrr) build a model for each temperature group and extract the equations

put those equation into a dataframe with temp so we can color like our lines in the plot

eqns <- diameter_biomass2 %>% split(.$temp) %>%
map(~ lm(carbon ~ diameter, data = .)) %>%
map(lm_eqn) %>% 
do.call(rbind, .) %>%
as.data.frame() %>%
set_names("equation") %>%
mutate(temp = rownames(.))

p1 <- p + geom_text_repel(data = eqns,aes(x = -Inf, y = Inf,label = equation), parse = TRUE, segment.size = 0)
p1

enter image description here