Sergey Aldoukhov Sergey Aldoukhov - 2 months ago 29
R Question

R split apply combine with dplyr - how to keep NA resulting from slice

mtcars %>% select(mpg, cyl) %>% group_by(cyl) %>% arrange(mpg) %>% slice(8)


outputs

mpg cyl
<dbl> <dbl>
1 30.4 4
2 15.2 8


As you can see, it does not produce a row for 6 cylinders - what is the recommended way to keep all the groups, even if combine is empty?

Answer

To quickly select a row from each group, keeping NAs, you can subset inside summarise_all:

mtcars %>% group_by(cyl) %>% 
    arrange(mpg) %>% 
    summarise_all(funs(.[8]))

## # A tibble: 3 × 11
##     cyl   mpg  disp    hp  drat    wt  qsec    vs    am  gear  carb
##   <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1     4  30.4  75.7    52  4.93 1.615 18.52     1     1     4     2
## 2     6    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA
## 3     8  15.2 304.0   150  3.15 3.435 17.30     0     0     3     2

However, @Frank is right above; it won't extend nicely to subsetting to multiple rows in this format because summarise demands a single result row for each group. To subset, say, rows 7 and 8 of each group, use a list column and unnest with tidyr::unnest:

library(tidyverse)

mtcars %>% group_by(cyl) %>% 
    arrange(mpg) %>% 
    summarise_all(funs(list(.[7:8]))) %>% 
    unnest()

## # A tibble: 6 × 11
##     cyl   mpg  disp    hp  drat    wt  qsec    vs    am  gear  carb
##   <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1     4  27.3  79.0    66  4.08 1.935 18.90     1     1     4     1
## 2     4  30.4  75.7    52  4.93 1.615 18.52     1     1     4     2
## 3     6  21.4 258.0   110  3.08 3.215 19.44     1     0     3     1
## 4     6    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA
## 5     8  15.2 275.8   180  3.07 3.780 18.00     0     0     3     3
## 6     8  15.2 304.0   150  3.15 3.435 17.30     0     0     3     2