Jimmy Jimmy - 3 months ago 51
R Question

Add p-value and r on ggplot [Follow-up]

It is a follow-up question. When I run the code given below, I get a plot with two R2 and p-value but p-value = 0. It is possibly due to very small p-value. I tried increasing no. of digits to 20 (here

signif(..p.value.., digits = 4)
) but it did not work. I would rather prefer to mention exact p-value or use stars e.g.,
if (p<0.002) star='**' else if (p>=0.002&p<0.05) star='*' else star=''
. In addition, I would like to have r values listed in the plot. Have a look and please let me know which part needs to be amended. Looking forward!

UPDATE

Answer codes by @eipi10 for adding p-value work, but still seeking answers on adding correlation coefficient (r) in the ggplots.

Code:

library(dplyr)
library(ggplot2)
library(ggpmisc)

df <- diamonds %>%
dplyr::filter(cut%in%c("Fair","Ideal")) %>%
dplyr::filter(clarity%in%c("I1" , "SI2" , "SI1" , "VS2" , "VS1", "VVS2")) %>%
dplyr::mutate(new_price = ifelse(cut == "Fair",
price* 0.5,
price * 1.1))

formula <- y ~ x - 1

p <- ggplot(df, aes(x,y, color=factor(cut)))
p <- p + stat_smooth(method = "lm", formula = y ~ x-1, size = 1, level=0.95)
p <- p + geom_point(alpha = 0.3)
p <- p + stat_poly_eq(aes(label = paste(..rr.label..)),
label.x.npc = "right", label.y.npc = 0.15, formula = formula,
parse = TRUE, size = 3) +
stat_fit_glance(method = 'lm', method.args = list(formula = formula),
geom = 'text', aes(label = paste("P-value = ",
signif(..p.value.., digits = 4), sep = "")),label.x.npc = 'right',
label.y.npc = 0.35, size = 3)
print(p)


enter image description here

Answer

This is a large data set and you can see from the graph that the fit is almost perfect, which means the p-value for the regression is going to be miniscule. Below are regression models for each of the two levels of cut. To save space, only the key parts of the model summaries are shown:

lapply(unique(df$cut), function(g) {
  summary(lm(y ~ x - 1, df %>% filter(cut==g)))
})
cut=="Ideal"
...
Coefficients:
  Estimate Std. Error t value Pr(>|t|)    
x 1.001715   0.000269    3724   <2e-16 ***
...
Residual standard error: 0.2079 on 18291 degrees of freedom
Multiple R-squared:  0.9987,  Adjusted R-squared:  0.9987 
F-statistic: 1.387e+07 on 1 and 18291 DF,  p-value: < 2.2e-16

cut=="Fair"
...
Coefficients:
   Estimate Std. Error t value Pr(>|t|)    
x 0.9895032  0.0004096    2416   <2e-16 ***
...
Residual standard error: 0.1033 on 1583 degrees of freedom
Multiple R-squared:  0.9997,  Adjusted R-squared:  0.9997 
F-statistic: 5.836e+06 on 1 and 1583 DF,  p-value: < 2.2e-16

Note the enormous F statistics. The p-values for such large F statistics are essentially zero.

pf(5.836e06, 1, 1583, lower=FALSE)  
[1] 0

Any F statistic above about 2,400 (for the given degrees of freedom) will give a p-value below the smallest non-zero number that R can represent.

pf(2400, 1, 1583, lower=FALSE)
[1] 1.716433e-319

By default, when R rounds a number, it doesn't return trailing zeros (try round(1.340000, 5) or signif(1.340000,5)). To get more zeros printed, you can, for example, use sprintf to format the output string. The code below formats the p-value in scientific notation. For decimal numbers replace "%1.4e" with "%1.4f". See the help for more details on format strings:

p <- ggplot(df, aes(x,y, color=cut)) + 
  stat_smooth(method = "lm", formula = y ~ x-1, size = 1, level=0.95) + 
  geom_point(alpha = 0.3) +
  stat_poly_eq(aes(label = paste(..rr.label..)),
               label.x.npc = "right", label.y.npc = 0.15, formula = formula, 
               parse=TRUE, size = 3) + 
  stat_fit_glance(method = 'lm', method.args = list(formula = formula),
                  geom='text', aes(label=paste0("P-value = ", sprintf("%1.4e", ..p.value..))),
                  label.x.npc = 'right',
                  label.y.npc = 0.4, size = 3)

enter image description here

UPDATE: To add starred p-value ranges, one option is to use ifelse statements with p-value ranges as the conditions:

p <- ggplot(df, aes(x,y, color=cut)) + 
  stat_smooth(method = "lm", formula = y ~ x-1, size = 1, level=0.95) + 
  geom_point(alpha = 0.3) +
  stat_poly_eq(aes(label = paste(..rr.label..)),
               label.x.npc = "right", label.y.npc = 0.15, formula = formula, 
               parse=TRUE, size = 3) + 
  stat_fit_glance(method = 'lm', method.args = list(formula = formula),
                  geom='text', aes(label=ifelse(..p.value..< 0.001, "p<0.001**", 
                                                ifelse(..p.value..>=0.001 & ..p.value..<0.05, "p<0.05*", "p>0.05"))),
                  label.x.npc = 'right',
                  label.y.npc = 0.4, size = 3)

enter image description here