Jimmy - 1 year ago 164

R Question

It is a follow-up question. When I run the code given below, I get a plot with two R2 and p-value but p-value = 0. It is possibly due to very small p-value. I tried increasing no. of digits to 20 (here

`signif(..p.value.., digits = 4)`

`if (p<0.002) star='**' else if (p>=0.002&p<0.05) star='*' else star=''`

Answer codes by @eipi10 for adding p-value work, but still

Code:

`library(dplyr)`

library(ggplot2)

library(ggpmisc)

df <- diamonds %>%

dplyr::filter(cut%in%c("Fair","Ideal")) %>%

dplyr::filter(clarity%in%c("I1" , "SI2" , "SI1" , "VS2" , "VS1", "VVS2")) %>%

dplyr::mutate(new_price = ifelse(cut == "Fair",

price* 0.5,

price * 1.1))

formula <- y ~ x - 1

p <- ggplot(df, aes(x,y, color=factor(cut)))

p <- p + stat_smooth(method = "lm", formula = y ~ x-1, size = 1, level=0.95)

p <- p + geom_point(alpha = 0.3)

p <- p + stat_poly_eq(aes(label = paste(..rr.label..)),

label.x.npc = "right", label.y.npc = 0.15, formula = formula,

parse = TRUE, size = 3) +

stat_fit_glance(method = 'lm', method.args = list(formula = formula),

geom = 'text', aes(label = paste("P-value = ",

signif(..p.value.., digits = 4), sep = "")),label.x.npc = 'right',

label.y.npc = 0.35, size = 3)

print(p)

Answer Source

This is a large data set and you can see from the graph that the fit is almost perfect, which means the p-value for the regression is going to be miniscule. Below are regression models for each of the two levels of `cut`

. To save space, only the key parts of the model summaries are shown:

```
lapply(unique(df$cut), function(g) {
summary(lm(y ~ x - 1, df %>% filter(cut==g)))
})
```

`cut=="Ideal" ... Coefficients: Estimate Std. Error t value Pr(>|t|) x 1.001715 0.000269 3724 <2e-16 *** ... Residual standard error: 0.2079 on 18291 degrees of freedom Multiple R-squared: 0.9987, Adjusted R-squared: 0.9987 F-statistic: 1.387e+07 on 1 and 18291 DF, p-value: < 2.2e-16 cut=="Fair" ... Coefficients: Estimate Std. Error t value Pr(>|t|) x 0.9895032 0.0004096 2416 <2e-16 *** ... Residual standard error: 0.1033 on 1583 degrees of freedom Multiple R-squared: 0.9997, Adjusted R-squared: 0.9997 F-statistic: 5.836e+06 on 1 and 1583 DF, p-value: < 2.2e-16`

Note the enormous F statistics. The p-values for such large F statistics are essentially zero.

```
pf(5.836e06, 1, 1583, lower=FALSE)
```

`[1] 0`

Any F statistic above about 2,400 (for the given degrees of freedom) will give a p-value below the smallest non-zero number that R can represent.

```
pf(2400, 1, 1583, lower=FALSE)
```

`[1] 1.716433e-319`

By default, when R rounds a number, it doesn't return trailing zeros (try `round(1.340000, 5)`

or `signif(1.340000,5)`

). To get more zeros printed, you can, for example, use `sprintf`

to format the output string. The code below formats the p-value in scientific notation. For decimal numbers replace `"%1.4e"`

with `"%1.4f"`

. See the help for more details on format strings:

```
p <- ggplot(df, aes(x,y, color=cut)) +
stat_smooth(method = "lm", formula = y ~ x-1, size = 1, level=0.95) +
geom_point(alpha = 0.3) +
stat_poly_eq(aes(label = paste(..rr.label..)),
label.x.npc = "right", label.y.npc = 0.15, formula = formula,
parse=TRUE, size = 3) +
stat_fit_glance(method = 'lm', method.args = list(formula = formula),
geom='text', aes(label=paste0("P-value = ", sprintf("%1.4e", ..p.value..))),
label.x.npc = 'right',
label.y.npc = 0.4, size = 3)
```

**UPDATE:** To add starred p-value ranges, one option is to use `ifelse`

statements with p-value ranges as the conditions:

```
p <- ggplot(df, aes(x,y, color=cut)) +
stat_smooth(method = "lm", formula = y ~ x-1, size = 1, level=0.95) +
geom_point(alpha = 0.3) +
stat_poly_eq(aes(label = paste(..rr.label..)),
label.x.npc = "right", label.y.npc = 0.15, formula = formula,
parse=TRUE, size = 3) +
stat_fit_glance(method = 'lm', method.args = list(formula = formula),
geom='text', aes(label=ifelse(..p.value..< 0.001, "p<0.001**",
ifelse(..p.value..>=0.001 & ..p.value..<0.05, "p<0.05*", "p>0.05"))),
label.x.npc = 'right',
label.y.npc = 0.4, size = 3)
```