anrpet anrpet - 3 years ago 93
R Question

R: How to find percentage of occurrences for a category

I have a data frame with 3 columns: Id number, Category, and brand:

X1 X2 X3
1234 Couch A87
1234 Chair A18
1234 Chair A19
1234 Chair A21
1235 Couch A89
1235 Couch A89
1235 Couch A89
1236 Chair A21
1236 Chair A21
1236 Chair A21


what i want to do is return the percentage of id numbers that have only 1 brand for a category.

I have tried to get a data frame that shows the count of brands per each id number like this:

X1 X2 total.count
1234 Couch 1
1234 Chair 3
1235 Couch 1
1236 Chair 1


using:

DF2 = (DF %>%
group_by(X2,X3) %>%
summarise(total.count=n()))


But it isn't working correctly. I would then want to filter out all of the id numbers that have counts more than 1. Any thoughts?

Answer Source

I think data.table is the easiest way to solve something like this.

dat = read.table("clipboard", header = TRUE)

library(data.table)
setDT(dat)

Then, to get the output you're looking for we could do this

> dat[ , .(total.count = uniqueN(X3)), by = .(X1, X2)]
     X1    X2 total.count
1: 1234 Couch           1
2: 1234 Chair           3
3: 1235 Couch           1
4: 1236 Chair           1

To limit this to ID numbers with counts of 1 we could do this

dat2 = dat[ , .(total.count = uniqueN(X3)), by = .(X1, X2)]

> dat2[total.count == 1]
     X1    X2 total.count
1: 1234 Couch           1
2: 1235 Couch           1
3: 1236 Chair           1

Or, if we wanted to do this without saving dat2, we could do this all in 1 line

> dat[ , .(total.count = uniqueN(X3)), by = .(X1, X2)][total.count == 1]
     X1    X2 total.count
1: 1234 Couch           1
2: 1235 Couch           1
3: 1236 Chair           1
Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download