nik nik - 2 months ago 11
R Question

how to remove rows when they don't meet a criteria

I have been busy with a data which looks like below

df<- structure(list(V1 = structure(1:6, .Label = c("A", "B", "C",
"D", "E", "F"), class = "factor"), V2 = structure(1:6, .Label = c("AA",
"BB", "CC", "DD", "EE", "FF"), class = "factor"), V3 = structure(c(6L,
5L, 4L, 1L, 3L, 2L), .Label = c("hddu", "jfhu", "jshsg", "kduf",
"Tlsu", "Trsm"), class = "factor"), V4 = c(1L, 2L, 0L, 0L, 5L,
6L), V5 = c(0L, 2L, 0L, 4L, 0L, 5L), V6 = c(0L, 0L, 4L, 6L, 0L,
7L), V7 = c(0L, 0L, 5L, 0L, 0L, 8L), V8 = c(0L, 0L, 0L, 0L, 6L,
0L), V9 = c(0L, 0L, 0L, 7L, 0L, 0L)), .Names = c("V1", "V2",
"V3", "V4", "V5", "V6", "V7", "V8", "V9"), class = "data.frame", row.names = c(NA,
-6L))


looks like this

V1 V2 V3 V4 V5 V6 V7 V8 V9
1 A AA Trsm 1 0 0 0 0 0
2 B BB Tlsu 2 2 0 0 0 0
3 C CC kduf 0 0 4 5 0 0
4 D DD hddu 0 4 6 0 0 7
5 E EE jshsg 5 0 0 0 6 0
6 F FF jfhu 6 5 7 8 0 0


what i want is to remove the rows that they don't have value for at least 2 columns consecutively. For example, they should have for first 2 column, or second two column or third two columns , if they have more, it is ok.
I want to detect them, and also make an output without them
in this case

row number 1 , 4 and 5. so I need two output

1- index 1, 4 and 5 (this shows which rows are removed)
2- The expected output is like this

B BB Tlsu 2 2 0 0 0 0
C CC kduf 0 0 4 5 0 0
F FF jfhu 6 5 7 8 0 0

Answer

You can manually pick up two data frames with index which shift horizontally by one and use vectorized & to find out if there is any consecutive TRUE and collect the row wise condition with rowSums as filtering index:

df[rowSums(df[4:8] & df[5:9]) != 0, ]

#   V1 V2   V3 V4 V5 V6 V7 V8 V9
# 2  B BB Tlsu  2  2  0  0  0  0
# 3  C CC kduf  0  0  4  5  0  0
# 4  D DD hddu  0  4  6  0  0  7
# 6  F FF jfhu  6  5  7  8  0  0

If the columns have to be paired every other column, seq can be used to generate necessary index:

df[rowSums(df[seq(4, 9, 2)] & df[seq(5, 9, 2)]) != 0, ]

#  V1 V2   V3 V4 V5 V6 V7 V8 V9
#2  B BB Tlsu  2  2  0  0  0  0
#3  C CC kduf  0  0  4  5  0  0
#6  F FF jfhu  6  5  7  8  0  0