Wildebeest - 1 year ago 80
R Question

# How to remove group of rows in R based on sum of 3 columns

I'm a newby in the R world.

I have a dataframe with 4 columns: location (with several factor levels each containing several rows), condition1, condition2, condition3 (those 3 columns have integer values).
I want to delete the location factors whose sum of the three condition columns is 0.

I don't want to delete each row that the sum of the three columns is 0 (that is simple and I have already done) because I want to keep the rows whit the sums=0 if it is not the fact that in each row of the location group is 0 the result.

Hope that this sounds understandable to someone...

If we need to remove the groups 'location' where all the values are 0, convert the 'data.frame' to 'data.table' (`setDT(my_df)` - from the comments, it seems like the OP's dataset is `data.table`), grouped by 'location', we specify the `.SDcols` as the 'condition' columns, get the row wise `sum` of the `.SD` using `Reduce` for each 'location', get the `sum` of the `vector`, check whether it is not equal to 0, find the rowindex of the logical vector with `.I`, extract the column that has the row index (`\$V1`) and subset the rows of 'my_df'.

``````setDT(my_df)[my_df[, .I[sum(Reduce(`+`, .SD))!=0] , by = location,
.SDcols = condition1:condition3]\$V1]
#    location condition1 condition2 condition3
# 1:      c02          1          5          4
# 2:      c02          1          6          3
# 3:      c02          0          0          0
# 4:      c02          2          5          0
# 5:      c02          5          0          0
# 6:      c03          1          0          0
# 7:      c03          0          4          4
# 8:      c03          0          7          7
# 9:      c03          5          3          9
#10:      c03          4          8          2
#11:      c05          0          3          0
#12:      c05          0          0          0
#13:      c05          5          5          5
#14:      c05          1          0          1
#15:      c05          0          7          5
``````

Another variation of the above is

``````setDT(my_df)[my_df[,  .I[Reduce(`+`, lapply(.SD, sum))!=0], by = location,
.SDcols = condition1:condition3]\$V1]
``````

Or `unlist` the `.SD`, get the `sum`, check whether it is not equal to 0 and as mentioned above extracting the row index column will be used for subsetting the rows.

``````setDT(my_df)[my_df[,  .I[sum(unlist(.SD))!=0], by = location]\$V1]
``````

### data

``````location <- c(rep(c("c01", "c02", "c03", "c04", "c05"), each = 5))
condition1 <- c(0,0,0,0,0,1,1,0,2,5,1,0,0,5,4,0,0,0,0,0,0,0,5,1,0)
condition2 <- c(0,0,0,0,0,5,6,0,5,0,0,4,7,3,8,0,0,0,0,0,3,0,5,0,7)
condition3 <- c(0,0,0,0,0,4,3,0,0,0,0,4,7,9,2,0,0,0,0,0,0,0,5,1,5)
my_df <- data.frame(location, condition1, condition2, condition3)
``````
Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download