user3674993 user3674993 - 25 days ago 17
R Question

Get percentage for cohort day and daycount

I have a cohort retention data frame

> cohortData
cohort dayCount count
1 25/10/2016 0 238
2 25/10/2016 1 137
3 25/10/2016 2 78
4 25/10/2016 3 32
5 25/10/2016 4 21
6 25/10/2016 5 25
7 26/10/2016 0 134
8 26/10/2016 1 97
9 26/10/2016 2 49
10 26/10/2016 3 22
11 26/10/2016 4 22
12 27/10/2016 0 136
13 27/10/2016 1 88
14 27/10/2016 2 38
15 27/10/2016 3 15
16 28/10/2016 0 138
17 28/10/2016 1 25
18 28/10/2016 2 19
19 29/10/2016 0 144
20 29/10/2016 1 28
21 30/10/2016 0 135


What I want to do is add a percent column to get a % of count against each cohort and daycount of 0 so for cohort 25/10/2016 percentage values for dayCount 0 through 2 would be 238/238, 137/238, 78/238.

I have looked at table.prop but was not able to get the result I want, I have tried doing a
cohortData$count / (by=list(cohortData$cohort, cohortData$dayCount==0))
but that is not correct and just gives errors.

I can convert the data into a NxN matrix, get a 2nd matrix to find % values and then unlist it and join back with the data frame above but I am sure there should be a much simpler and elegant way to go about it ><

J_F J_F
Answer

A dplyr solution would be this approach:

library(dplyr)

cohortData %>% 
  group_by(cohort) %>%
    mutate(percentage = count/count[dayCount == 0])
#        cohort dayCount count percentage
#        <fctr>    <int> <int>      <dbl>
#1  25/10/2016        0   238 1.00000000
#2  25/10/2016        1   137 0.57563025
#3  25/10/2016        2    78 0.32773109
#4  25/10/2016        3    32 0.13445378
#5  25/10/2016        4    21 0.08823529
#6  25/10/2016        5    25 0.10504202
#7  26/10/2016        0   134 1.00000000
#8  26/10/2016        1    97 0.72388060
#9  26/10/2016        2    49 0.36567164
#10 26/10/2016        3    22 0.16417910
## ... with 11 more rows
Comments