armoschiano - 6 months ago 34

R Question

I have this data.frame with a lot of NAs:

`df <- data.frame(a = rep(letters[1:3], each = 3),`

b = c(NA, NA, NA, 1, NA, 3, NA, NA, 7))

df

> df

a b

1 a NA

2 a NA

3 a NA

4 b 1

5 b NA

6 b 3

7 c NA

8 c NA

9 c 7

I would like to subset this dataframe to obtain only factor group rows that have no less than two values, such as this:

`a b`

1 b 1

2 b NA

3 b 3

I have tried this function but it doesn't work:

`subset(df, sum(!is.na(b)) < 1, by = a)`

> [1] a b

<0 rows> (or 0-length row.names)

Any suggestion? (other packages solutions are welcome)

Answer

We can use `data.table`

. Convert the 'data.frame' to 'data.table' (`setDT(df)`

), grouped by 'a', `if`

the `sum`

of logical vector (i.e. non-NA elements - `!is.na(b)`

) is greater than 1, then Subset the Data.table.

```
library(data.table)
setDT(df)[,if(sum(!is.na(b))>1) .SD , by = a]
# a b
#1: b 1
#2: b NA
#3: b 3
```

Or using `dplyr`

, with the same logic, after grouping by 'a', we `filter`

the rows.

```
library(dplyr)
df %>%
group_by(a) %>%
filter(sum(!is.na(b))>1)
# a b
# <fctr> <dbl>
#1 b 1
#2 b NA
#3 b 3
```

Or in `base R`

with `ave`

```
df[with(df, ave(b, a, FUN = function(x) sum(!is.na(x))>1)!=0),]
```