armoschiano - 1 year ago 85
R Question

r subset rows by criteria and by factor group

I have this data.frame with a lot of NAs:

``````df <- data.frame(a = rep(letters[1:3], each = 3),
b = c(NA, NA, NA, 1, NA, 3, NA, NA, 7))
df
> df
a  b
1 a NA
2 a NA
3 a NA
4 b  1
5 b NA
6 b  3
7 c NA
8 c NA
9 c  7
``````

I would like to subset this dataframe to obtain only factor group rows that have no less than two values, such as this:

``````  a  b
1 b 1
2 b NA
3 b 3
``````

I have tried this function but it doesn't work:

``````subset(df, sum(!is.na(b)) < 1, by = a)

> [1] a b
<0 rows> (or 0-length row.names)
``````

Any suggestion? (other packages solutions are welcome)

We can use `data.table`. Convert the 'data.frame' to 'data.table' (`setDT(df)`), grouped by 'a', `if` the `sum` of logical vector (i.e. non-NA elements - `!is.na(b)`) is greater than 1, then Subset the Data.table.

``````library(data.table)
setDT(df)[,if(sum(!is.na(b))>1) .SD , by = a]
#   a  b
#1: b  1
#2: b NA
#3: b  3
``````

Or using `dplyr`, with the same logic, after grouping by 'a', we `filter` the rows.

``````library(dplyr)
df %>%
group_by(a) %>%
filter(sum(!is.na(b))>1)
#      a     b
#  <fctr> <dbl>
#1      b     1
#2      b    NA
#3      b     3
``````

Or in `base R` with `ave`

``````df[with(df, ave(b, a, FUN = function(x) sum(!is.na(x))>1)!=0),]
``````
Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download