Wildebeest Wildebeest - 2 months ago 14
R Question

How to remove group of rows in R based on sum of 3 columns

I'm a newby in the R world.

I have a dataframe with 4 columns: location (with several factor levels each containing several rows), condition1, condition2, condition3 (those 3 columns have integer values).
I want to delete the location factors whose sum of the three condition columns is 0.

I don't want to delete each row that the sum of the three columns is 0 (that is simple and I have already done) because I want to keep the rows whit the sums=0 if it is not the fact that in each row of the location group is 0 the result.

Hope that this sounds understandable to someone...
Thanks in advance

Answer

If we need to remove the groups 'location' where all the values are 0, convert the 'data.frame' to 'data.table' (setDT(my_df) - from the comments, it seems like the OP's dataset is data.table), grouped by 'location', we specify the .SDcols as the 'condition' columns, get the row wise sum of the .SD using Reduce for each 'location', get the sum of the vector, check whether it is not equal to 0, find the rowindex of the logical vector with .I, extract the column that has the row index ($V1) and subset the rows of 'my_df'.

setDT(my_df)[my_df[, .I[sum(Reduce(`+`, .SD))!=0] , by = location, 
                      .SDcols = condition1:condition3]$V1]
#    location condition1 condition2 condition3
# 1:      c02          1          5          4
# 2:      c02          1          6          3
# 3:      c02          0          0          0
# 4:      c02          2          5          0
# 5:      c02          5          0          0
# 6:      c03          1          0          0
# 7:      c03          0          4          4
# 8:      c03          0          7          7
# 9:      c03          5          3          9
#10:      c03          4          8          2
#11:      c05          0          3          0
#12:      c05          0          0          0
#13:      c05          5          5          5
#14:      c05          1          0          1
#15:      c05          0          7          5

Another variation of the above is

setDT(my_df)[my_df[,  .I[Reduce(`+`, lapply(.SD, sum))!=0], by = location, 
                    .SDcols = condition1:condition3]$V1]

Or unlist the .SD, get the sum, check whether it is not equal to 0 and as mentioned above extracting the row index column will be used for subsetting the rows.

setDT(my_df)[my_df[,  .I[sum(unlist(.SD))!=0], by = location]$V1]

data

location <- c(rep(c("c01", "c02", "c03", "c04", "c05"), each = 5))
condition1 <- c(0,0,0,0,0,1,1,0,2,5,1,0,0,5,4,0,0,0,0,0,0,0,5,1,0)
condition2 <- c(0,0,0,0,0,5,6,0,5,0,0,4,7,3,8,0,0,0,0,0,3,0,5,0,7)
condition3 <- c(0,0,0,0,0,4,3,0,0,0,0,4,7,9,2,0,0,0,0,0,0,0,5,1,5)
my_df <- data.frame(location, condition1, condition2, condition3) 
Comments