pv7 pv7 - 1 month ago 10
R Question

Using index to reference column in summarise() in dplyr - R

I would like to reference a column inside the summarise() in dplyr with its index rather than with its name. For example:

> a

id visit timepoint bedroom den
1 12 0 62 NA
2 14 0 53 6.00
3 14 0 56 2.75
4 14 1 55 NA
5 14 2 61 NA
6 15 0 54 NA
7 15 1 58 2.75
8 16 2 59 NA
9 16 2 60 NA
10 17 1 57 NA

# E.g.
a %>% group_by(visit) %>% summarise(avg.bedroom = mean(bedroom, na.rm =T)
# Returns
visit avg.dedroom
<dbl> <dbl>
1 0 4.375
2 1 2.750
3 2 NaN


How could I use the index of column "bedroom" rather its name in the summarise clause? I tried:

a %>% group_by(visit) %>% summarise("4" = mean(.[[4]], na.rm = T))


but this returned false results:

visit `4`
<dbl> <dbl>
1 0 3.833333
2 1 3.833333
3 2 3.833333


Is my objective achievable and if yes how? Thank you.

Answer

Perhaps not exactly what you're looking for, but one option would be to use purrr rather than dplyr. Something like

library(purrr)

mtcars %>% 
    split(.$cyl) %>% 
    map_dbl(function(x) mean(x[ ,4]))

 #        4         6         8 
 # 82.63636 122.28571 209.21429