armoschiano armoschiano - 4 months ago 15
R Question

r subset rows by criteria and by factor group

I have this data.frame with a lot of NAs:

df <- data.frame(a = rep(letters[1:3], each = 3),
b = c(NA, NA, NA, 1, NA, 3, NA, NA, 7))
df
> df
a b
1 a NA
2 a NA
3 a NA
4 b 1
5 b NA
6 b 3
7 c NA
8 c NA
9 c 7


I would like to subset this dataframe to obtain only factor group rows that have no less than two values, such as this:

a b
1 b 1
2 b NA
3 b 3


I have tried this function but it doesn't work:

subset(df, sum(!is.na(b)) < 1, by = a)

> [1] a b
<0 rows> (or 0-length row.names)


Any suggestion? (other packages solutions are welcome)

Answer

We can use data.table. Convert the 'data.frame' to 'data.table' (setDT(df)), grouped by 'a', if the sum of logical vector (i.e. non-NA elements - !is.na(b)) is greater than 1, then Subset the Data.table.

library(data.table)
setDT(df)[,if(sum(!is.na(b))>1) .SD , by = a]
#   a  b
#1: b  1
#2: b NA
#3: b  3

Or using dplyr, with the same logic, after grouping by 'a', we filter the rows.

library(dplyr)
df %>% 
    group_by(a) %>%
    filter(sum(!is.na(b))>1)
#      a     b
#  <fctr> <dbl>
#1      b     1
#2      b    NA
#3      b     3

Or in base R with ave

df[with(df, ave(b, a, FUN = function(x) sum(!is.na(x))>1)!=0),]