user2904120 user2904120 - 12 days ago 5
R Question

Efficient way of subsetting a dataframe in R using multiple conditions

Is there a clearer and more efficient way to subset a dataframe in R using multiple conditions?
Here is my simplified example. Columns containing triplicates (v1,v2,v3 and v4,v5,v6) can contain max one 0 value within triplicate per row, otherwise should be excluded:

v1 v2 v3 v4 v5 v6
1 0 3 0 0 2
1 1 1 1 2 0
0 0 0 1 1 0
0 0 0 0 0 0


Here is my simple way of approaching the problem.

data_short<-subset(data, (((v1 != 0 & v2 !=0) | (v1 != 0 & v3 !=0) | (v2 != 0 & v3 !=0)) & ((v4 != 0 & v5 !=0) | (v4 != 0 & v6 !=0) | (v5 != 0 & v6 !=0)))

v1 v2 v3 v4 v5 v6
1 1 1 1 2 0

Answer

You can use rowSums to count the number of time the data is 0 in any 3 first and 3 last columns:

df <- read.table(text="v1  v2  v3  v4  v5  v6
1   0   3   0   0   2
1   1   1   1   2   0
0   0   0   1   1   0
0   0   0   0   0   0", header=TRUE)

df[rowSums(df[,1:3]==0)<=1 & rowSums(df[,4:6]==0)<=1,]

  v1 v2 v3 v4 v5 v6
2  1  1  1  1  2  0
Comments