Diana01 Diana01 -4 years ago 58
R Question

Subset data set based on most frequent values in a column

I have a data set that looks like the following:

head(data1)
Data number PatientSID
1 1 24663193
2 3 7451277
3 6 7449440
4 8 7350669
5 9 7328477
6 11 7324432

Condition
1 acute coronary syndrome
2 abdominal pain
3 epistaxis
4 leg pain
5 chronic back pain
6 back pain


I used the aggregate function to see the frequency of patient Conditions:

x <- aggregate(data.frame(count = data1$Condition), list(value = data1$Condition), length)
head(x,10)
value count
1 3 108
2 4 wheeler accident 1
3 abdominal 1
4 abdominal aneurysm 1
5 abdominal aortic aneurysm 1
6 abdominal bloating 2
7 abdominal cramps 2
8 abdominal discomfort 6
9 abdominal distension 2
10 abdominal distention 21


Now based on the output above, I want to subset data1 into a dataframe that only contains rows with Condition count >=10. So my subset would contain all rows with conditions "3" and "abdominal distension" for instance.
How can I do this?

Answer Source

You can use dplyr:

x.sub <- x %>%
         filter(count >= 10)

data1.sub <- data1[data1$Condition %in% x.sub$value, ]
Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download