Rishi -4 years ago 152
R Question

# subsetting data.table in r based on repeated rows

I have a list of data tables stored in an object 'ddf' (a sample is shown below):

`````` [[43]]
V1 V2 V3
1:  b  c  a
2:  b  c  a
3:  b  c  a
4:  b  c  a
5:  b  b  a
6:  b  c  a
7:  b  c  a

[[44]]
V1 V2 V3
1:  a  c  a
2:  a  c  a
3:  a  c  a
4:  a  c  a
5:  a  c  a

[[45]]
V1 V2 V3
1:  a  c  b
2:  a  c  b
3:  a  c  b
4:  a  c  b
5:  a  c  b
6:  a  c  b
7:  a  c  b
8:  a  c  b
9:  a  c  b
.............and so on till [[100]]
``````

I want to Subset the list "ddf" such that the result only consists of ddf's which:

1. have at least 9 rows each

2. each of the 9 rows are same

3. I want to store this sub-setted output

I have written some code for this below:

`````` for(i in 1:100){
m=(as.numeric(nrow(df[[i]]))>= 9)
if(m == TRUE & df[[i]][1,] = df[[i]][2,] =
=df[[i]][3,] =df[[i]][4,] =df[[i]][5,] =df[[i]][6,]=
df[[i]][7,]=df[[i]][8,]=df[[i]][9,]){
print(df[[i]])
}}
``````

Please tell me whats wrong & how I can generalize the result for sub-setting based on "n" similar rows.

When `lst` is your list, then:

``````lst[sapply(lst, nrow) >= 9 & sapply(lst, function(x) nrow(unique(x))) == 1]
``````

should give you the desired result.

Where:

• `sapply(lst, nrow) >= 9` checks whether the datatables have nine or more rows
• `sapply(lst, function(x) nrow(unique(x)))` checks whether all the rows are the same.

Or with one `sapply` call as @docendodiscimus suggested:

``````lst[sapply(lst, function(x) nrow(x) >= 9 & nrow(unique(x)) == 1)]
``````
Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download