Rishi Rishi -4 years ago 152
R Question

subsetting data.table in r based on repeated rows

I have a list of data tables stored in an object 'ddf' (a sample is shown below):

[[43]]
V1 V2 V3
1: b c a
2: b c a
3: b c a
4: b c a
5: b b a
6: b c a
7: b c a

[[44]]
V1 V2 V3
1: a c a
2: a c a
3: a c a
4: a c a
5: a c a

[[45]]
V1 V2 V3
1: a c b
2: a c b
3: a c b
4: a c b
5: a c b
6: a c b
7: a c b
8: a c b
9: a c b
.............and so on till [[100]]


I want to Subset the list "ddf" such that the result only consists of ddf's which:


  1. have at least 9 rows each

  2. each of the 9 rows are same

  3. I want to store this sub-setted output



I have written some code for this below:

for(i in 1:100){
m=(as.numeric(nrow(df[[i]]))>= 9)
if(m == TRUE & df[[i]][1,] = df[[i]][2,] =
=df[[i]][3,] =df[[i]][4,] =df[[i]][5,] =df[[i]][6,]=
df[[i]][7,]=df[[i]][8,]=df[[i]][9,]){
print(df[[i]])
}}


Please tell me whats wrong & how I can generalize the result for sub-setting based on "n" similar rows.

Answer Source

When lst is your list, then:

lst[sapply(lst, nrow) >= 9 & sapply(lst, function(x) nrow(unique(x))) == 1]

should give you the desired result.

Where:

  • sapply(lst, nrow) >= 9 checks whether the datatables have nine or more rows
  • sapply(lst, function(x) nrow(unique(x))) checks whether all the rows are the same.

Or with one sapply call as @docendodiscimus suggested:

lst[sapply(lst, function(x) nrow(x) >= 9 & nrow(unique(x)) == 1)]
Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download