Mihael Mihael - 13 days ago 7
R Question

Calculate proportions of counts within count categories + dependent on other categorical variables

I hope the convoluted title makes sense, but the problem I have is not so easy to get a head around.

The toy dataset lists customer visits along with customer exemption status and visit type:

df <- structure(list(Customer = structure(c(8L, 2L, 5L, 4L, 4L, 1L,
1L, 6L, 6L, 7L, 7L, 7L, 3L, 3L, 3L), .Label = c("Aaron", "Elizabeth",
"Frank", "John", "Mary", "Pam", "Rob", "Sam"), class = "factor"),
Exemption = structure(c(2L, 1L, 1L, 2L, 2L, 2L, 2L, 1L, 1L,
2L, 2L, 2L, 1L, 1L, 1L), .Label = c("Exempt", "Non-exempt"
), class = "factor"), Type = structure(c(1L, 1L, 2L, 1L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 2L), .Label = c("Type 1",
"Type 2"), class = "factor")), .Names = c("Customer", "Exemption",
"Type"), class = "data.frame", row.names = c(NA, -15L))

Customer Exemption Type
1 Sam Non-exempt Type 1
2 Elizabeth Exempt Type 1
3 Mary Exempt Type 2
4 John Non-exempt Type 1
5 John Non-exempt Type 2
6 Aaron Non-exempt Type 2
7 Aaron Non-exempt Type 2
8 Pam Exempt Type 2
9 Pam Exempt Type 2
10 Rob Non-exempt Type 2
11 Rob Non-exempt Type 2
12 Rob Non-exempt Type 1
13 Frank Exempt Type 1
14 Frank Exempt Type 1
15 Frank Exempt Type 2


I want to fist categorize customers by the number of visits they made, then within that calculate the proportions of Type1/2 visits, and maybe also break down the results by exemption status, such as the output looks like this:

Number_of_visits Exemption Type Proportion
1 1 Non-exempt Type 1 1.00
2 1 Non-exempt Type 2 0.00
3 1 Exempt Type 1 0.50
4 1 Exempt Type 2 0.50
5 2 Non-exempt Type 1 0.25
6 2 Non-exempt Type 2 0.75
7 2 Exempt Type 1 0.00
8 2 Exempt Type 2 1.00
9 3 Non-exempt Type 1 0.33
10 3 Non-exempt Type 2 0.67
11 3 Exempt Type 1 0.67
12 3 Exempt Type 2 0.33


I tried a few things with
group_by(Customer, Type) %>% summarise(n())
using
dplyr
, which did not appear to be right.

Answer

You can use count from dplyr to count the number of occurrences of Exemption and Type grouped by Number_of_visits:

library(dplyr)
library(tidyr)
res <- df %>% group_by(Customer) %>% 
              mutate(Number_of_visits=n()) %>% 
              group_by(Number_of_visits) %>% 
              count(Exemption, Type) %>%
              complete(Type, fill=list(n=0)) %>%
              group_by(Number_of_visits,Exemption) %>% 
              mutate(Proportion=n/sum(n))

Notes:

  1. First group_by Customer to compute the number of visits using n().
  2. Then group_by Number_of_visits and use count to count the number of occurrences for each value pair for Exemption and Type. This creates a column named n containing this count.
  3. Use tidyr::complete to fill in any missing value pair for Exemption and Type with a count of zero.
  4. Finally, group_by both Number_of_visits and Exemption to compute the desired Proportion.

The result using your data is as expected.

print(res)
##Source: local data frame [12 x 5]
##Groups: Number_of_visits, Exemption [6]
##
##   Number_of_visits  Exemption   Type     n Proportion
##              <int>     <fctr> <fctr> <dbl>      <dbl>
##1                 1     Exempt Type 1     1  0.5000000
##2                 1     Exempt Type 2     1  0.5000000
##3                 1 Non-exempt Type 1     1  1.0000000
##4                 1 Non-exempt Type 2     0  0.0000000
##5                 2     Exempt Type 1     0  0.0000000
##6                 2     Exempt Type 2     2  1.0000000
##7                 2 Non-exempt Type 1     1  0.2500000
##8                 2 Non-exempt Type 2     3  0.7500000
##9                 3     Exempt Type 1     2  0.6666667
##10                3     Exempt Type 2     1  0.3333333
##11                3 Non-exempt Type 1     1  0.3333333
##12                3 Non-exempt Type 2     2  0.6666667
Comments