anrpet - 3 years ago 93
R Question

# R: How to find percentage of occurrences for a category

I have a data frame with 3 columns: Id number, Category, and brand:

``````   X1        X2       X3
1234      Couch     A87
1234      Chair     A18
1234      Chair     A19
1234      Chair     A21
1235      Couch     A89
1235      Couch     A89
1235      Couch     A89
1236      Chair     A21
1236      Chair     A21
1236      Chair     A21
``````

what i want to do is return the percentage of id numbers that have only 1 brand for a category.

I have tried to get a data frame that shows the count of brands per each id number like this:

``````   X1        X2       total.count
1234      Couch        1
1234      Chair        3
1235      Couch        1
1236      Chair        1
``````

using:

``````DF2 = (DF %>%
group_by(X2,X3) %>%
summarise(total.count=n()))
``````

But it isn't working correctly. I would then want to filter out all of the id numbers that have counts more than 1. Any thoughts?

I think data.table is the easiest way to solve something like this.

``````dat = read.table("clipboard", header = TRUE)

library(data.table)
setDT(dat)
``````

Then, to get the output you're looking for we could do this

``````> dat[ , .(total.count = uniqueN(X3)), by = .(X1, X2)]
X1    X2 total.count
1: 1234 Couch           1
2: 1234 Chair           3
3: 1235 Couch           1
4: 1236 Chair           1
``````

To limit this to ID numbers with counts of 1 we could do this

``````dat2 = dat[ , .(total.count = uniqueN(X3)), by = .(X1, X2)]

> dat2[total.count == 1]
X1    X2 total.count
1: 1234 Couch           1
2: 1235 Couch           1
3: 1236 Chair           1
``````

Or, if we wanted to do this without saving dat2, we could do this all in 1 line

``````> dat[ , .(total.count = uniqueN(X3)), by = .(X1, X2)][total.count == 1]
X1    X2 total.count
1: 1234 Couch           1
2: 1235 Couch           1
3: 1236 Chair           1
``````
Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download