Rishi Rishi -4 years ago 152
R Question

subsetting data.table in r based on repeated rows

I have a list of data tables stored in an object 'ddf' (a sample is shown below):

V1 V2 V3
1: b c a
2: b c a
3: b c a
4: b c a
5: b b a
6: b c a
7: b c a

V1 V2 V3
1: a c a
2: a c a
3: a c a
4: a c a
5: a c a

V1 V2 V3
1: a c b
2: a c b
3: a c b
4: a c b
5: a c b
6: a c b
7: a c b
8: a c b
9: a c b
.............and so on till [[100]]

I want to Subset the list "ddf" such that the result only consists of ddf's which:

  1. have at least 9 rows each

  2. each of the 9 rows are same

  3. I want to store this sub-setted output

I have written some code for this below:

for(i in 1:100){
m=(as.numeric(nrow(df[[i]]))>= 9)
if(m == TRUE & df[[i]][1,] = df[[i]][2,] =
=df[[i]][3,] =df[[i]][4,] =df[[i]][5,] =df[[i]][6,]=

Please tell me whats wrong & how I can generalize the result for sub-setting based on "n" similar rows.

Answer Source

When lst is your list, then:

lst[sapply(lst, nrow) >= 9 & sapply(lst, function(x) nrow(unique(x))) == 1]

should give you the desired result.


  • sapply(lst, nrow) >= 9 checks whether the datatables have nine or more rows
  • sapply(lst, function(x) nrow(unique(x))) checks whether all the rows are the same.

Or with one sapply call as @docendodiscimus suggested:

lst[sapply(lst, function(x) nrow(x) >= 9 & nrow(unique(x)) == 1)]
Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download