PaoloH - 1 year ago 119
R Question

# R plot with ggplot2 linear regression with a transformed dependent variable

So my problem is quite simple, I want to plot the exponential regression of my data, so far what I've done was plot the polynomial regression :

``````ggplot(data = mydataANOVA, aes(x = TpsenJour, y = PrptionJourAuDelaDe, color = Type_Contrat))+
geom_point()+
geom_smooth(method ="lm", formula = y ~ poly(x,2))
``````

I got the following plot :

The regression isn't exactly a good fit for the actual data but the data looked a lot like an exponential function so I used the logarithm of my data, I used the same code with the log of my data to get the following plot:

Which seemed to be an very accurate fit, so I wanted to compare the two regression models directly by plotting the exponential regression, but when i used the formula
`formula = y ~ exp(poly(x,2))`
, I didn't get the same accurate regression and instead I got something else :

Which is even less accurate than the first one. How am I supposed to get that polynomial exponential regression plotted with the confidence intervals. I managed to get the good regression on the regular plot function but not with the confidence intervals and not in ggplot2. Here is what I got for only one of the two curves:

How can I get the good regression on ggplot2 with the confidence intervals ?
Here is the data that I used on one of the 2 curves.

``````      TpsenJour PrptionJourAuDelaDe fact
1             1            0.955669    a
3             3            0.877947    a
5             5            0.815058    a
7             7            0.764725    a
9             9            0.721070    a
11           11            0.681675    a
13           13            0.646490    a
15           15            0.614689    a
17           17            0.585664    a
19           19            0.558905    a
21           21            0.534362    a
23           23            0.511791    a
25           25            0.490651    a
27           27            0.470923    a
29           29            0.452498    a
31           31            0.435190    a
33           33            0.419160    a
35           35            0.404359    a
37           37            0.390519    a
40           40            0.371018    a
40.1         40            0.371018    a
43           43            0.352960    a
46           46            0.336170    a
49           49            0.320631    a
52           52            0.306194    a
55           55            0.292584    a
58           58            0.279858    a
62           62            0.264096    a
65           65            0.253316    a
68           68            0.243120    a
71           71            0.233544    a
74           74            0.224474    a
77           77            0.215905    a
81           81            0.205180    a
84           84            0.197623    a
87           87            0.190440    a
90           90            0.183609    a
93           93            0.177278    a
96           96            0.171358    a
100         100            0.163951    a
``````

Thank you.

I had a smal problem with the answer of @Roland, where it would return an error, I think I've solved it. I simply had to add two lines : (I hope that by fixing my error I did not alter the originally predicted outcome)

``````fact<-DF\$fact
fit <- lm(log(PrptionJourAuDelaDe) ~ poly(TpsenJour, 2) * fact, data = mydataANOVA)
fact<-c(rep('a',1000),rep('b',1000))
pred <- expand.grid(TpsenJour = seq(min(mydataANOVA\$TpsenJour), max(mydataANOVA\$TpsenJour), length.out = 1e3),
fact = unique(mydataANOVA\$fact))

pred <- cbind(pred,
exp(predict(fit,
newdata = data.frame(TpsenJour = pred\$TpsenJour),
interval = "confidence")))

ggplot(data = DF, aes(x = TpsenJour, y = PrptionJourAuDelaDe, color = fact)) +
geom_point() +
geom_ribbon(data = pred, aes(y = fit, ymin = lwr, ymax = upr, fill = fact), alpha = 0.3) +
geom_line(data = pred, aes(y = fit, color = fact))
``````

I then get the following plot:

Do the fit outside of ggplot2:

``````fit <- lm(log(PrptionJourAuDelaDe) ~ poly(TpsenJour, 2) * fact, data = DF)
pred <- expand.grid(TpsenJour = seq(min(DF\$TpsenJour), max(DF\$TpsenJour), length.out = 1e3),
fact = unique(DF\$fact))
pred <- cbind(pred,
exp(predict(fit,
newdata = data.frame(TpsenJour = pred\$TpsenJour),
interval = "confidence")))

library(ggplot2)
ggplot(data = DF, aes(x = TpsenJour, y = PrptionJourAuDelaDe, color = fact)) +
geom_point() +
geom_ribbon(data = pred, aes(y = fit, ymin = lwr, ymax = upr, fill = fact), alpha = 0.3) +
geom_line(data = pred, aes(y = fit, color = fact))
``````

Note that I wouldn't call this exponential regression. It's linear regression with a transformed dependent variable (in contrast to a non-linear model, which would need to be fitted with `nls`). And its probably not the model I would use.

Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download