Wildebeest - 7 months ago 33

R Question

I'm a newby in the R world.

I have a dataframe with 4 columns: location (with several factor levels each containing several rows), condition1, condition2, condition3 (those 3 columns have integer values).

I want to delete the location factors whose sum of the three condition columns is 0.

I don't want to delete each row that the sum of the three columns is 0 (that is simple and I have already done) because I want to keep the rows whit the sums=0 if it is not the fact that in each row of the location group is 0 the result.

Hope that this sounds understandable to someone...

Thanks in advance

Answer

If we need to remove the groups 'location' where all the values are 0, convert the 'data.frame' to 'data.table' (`setDT(my_df)`

- from the comments, it seems like the OP's dataset is `data.table`

), grouped by 'location', we specify the `.SDcols`

as the 'condition' columns, get the row wise `sum`

of the `.SD`

using `Reduce`

for each 'location', get the `sum`

of the `vector`

, check whether it is not equal to 0, find the rowindex of the logical vector with `.I`

, extract the column that has the row index (`$V1`

) and subset the rows of 'my_df'.

```
setDT(my_df)[my_df[, .I[sum(Reduce(`+`, .SD))!=0] , by = location,
.SDcols = condition1:condition3]$V1]
# location condition1 condition2 condition3
# 1: c02 1 5 4
# 2: c02 1 6 3
# 3: c02 0 0 0
# 4: c02 2 5 0
# 5: c02 5 0 0
# 6: c03 1 0 0
# 7: c03 0 4 4
# 8: c03 0 7 7
# 9: c03 5 3 9
#10: c03 4 8 2
#11: c05 0 3 0
#12: c05 0 0 0
#13: c05 5 5 5
#14: c05 1 0 1
#15: c05 0 7 5
```

Another variation of the above is

```
setDT(my_df)[my_df[, .I[Reduce(`+`, lapply(.SD, sum))!=0], by = location,
.SDcols = condition1:condition3]$V1]
```

Or `unlist`

the `.SD`

, get the `sum`

, check whether it is not equal to 0 and as mentioned above extracting the row index column will be used for subsetting the rows.

```
setDT(my_df)[my_df[, .I[sum(unlist(.SD))!=0], by = location]$V1]
```

```
location <- c(rep(c("c01", "c02", "c03", "c04", "c05"), each = 5))
condition1 <- c(0,0,0,0,0,1,1,0,2,5,1,0,0,5,4,0,0,0,0,0,0,0,5,1,0)
condition2 <- c(0,0,0,0,0,5,6,0,5,0,0,4,7,3,8,0,0,0,0,0,3,0,5,0,7)
condition3 <- c(0,0,0,0,0,4,3,0,0,0,0,4,7,9,2,0,0,0,0,0,0,0,5,1,5)
my_df <- data.frame(location, condition1, condition2, condition3)
```