luisgonzalez luisgonzalez - 3 months ago 16
R Question

How to transform yes/no rows into proportion with dplyr (preferably)?

Data comes from this RData dataset

Here is the script:

library(dplyr)
library(ggplot2)
load("brfss2013.RData")

test <- brfss2013 %>%
select(chcscncr,exract11) %>%
filter(chcscncr != "NA" , exract11 != "NA") %>%
group_by(exract11,chcscncr) %>%
summarise(count = n())


Which results in this table:

> head(test)
Source: local data frame [6 x 3]
Groups: exract11 [3]

exract11 chcscncr count
<fctr> <fctr> <int>
1 Active Gaming Devices (Wii Fit, Dance, Dance revolution) Yes 19
2 Active Gaming Devices (Wii Fit, Dance, Dance revolution) No 287
3 Aerobics video or class Yes 800
4 Aerobics video or class No 7340
5 Backpacking Yes 4
6 Backpacking No 38


I would like to achieve a table that gives the "yes" proportion of each type of sport, something like:

From

Type Ans Count
Sport A yes 45
Sport A no 55
Sport B yes 34
Sport B no 66


to:

Type p(yes)
Sport A 0.45
Sport B 0.34

Answer

prop.table converts totals to proportions (in this case, just x/sum(x) for the values for each group), so for your "From" table:

brfss2013 %>%
    select(chcscncr,exract11) %>% 
    na.omit() %>%    # `==` doesn't work for NA
    count(exract11, chcscncr) %>%    # equivalent to `group_by(...) %>% summarise(n = n())`
    group_by(exract11) %>%
    mutate(pct = prop.table(n) * 100)    # `* 100` to convert to percent

## Source: local data frame [144 x 4]
## Groups: exract11 [75]
## 
##                                                    exract11 chcscncr     n      pct
##                                                      <fctr>   <fctr> <int>    <dbl>
## 1  Active Gaming Devices (Wii Fit, Dance, Dance revolution)      Yes    19  6.20915
## 2  Active Gaming Devices (Wii Fit, Dance, Dance revolution)       No   287 93.79085
## 3                                   Aerobics video or class      Yes   800  9.82801
## 4                                   Aerobics video or class       No  7340 90.17199
## 5                                               Backpacking      Yes     4  9.52381
## 6                                               Backpacking       No    38 90.47619
## 7                                                 Badminton      Yes     4 10.52632
## 8                                                 Badminton       No    34 89.47368
## 9                                                Basketball      Yes    37  1.64664
## 10                                               Basketball       No  2210 98.35336
## # ... with 134 more rows

For your "to" table, filter to just the "Yes" rows:

brfss2013 %>%
    select(chcscncr,exract11) %>% 
    na.omit() %>% 
    count(exract11, chcscncr) %>%
    group_by(exract11) %>%
    mutate(p_yes = prop.table(n)) %>%
    filter(chcscncr == "Yes")

## Source: local data frame [69 x 4]
## Groups: exract11 [69]
## 
##                                                                 exract11 chcscncr     n      p_yes
##                                                                   <fctr>   <fctr> <int>      <dbl>
## 1               Active Gaming Devices (Wii Fit, Dance, Dance revolution)      Yes    19 0.06209150
## 2                                                Aerobics video or class      Yes   800 0.09828010
## 3                                                            Backpacking      Yes     4 0.09523810
## 4                                                              Badminton      Yes     4 0.10526316
## 5                                                             Basketball      Yes    37 0.01646640
## 6                                             Bicycling machine exercise      Yes   987 0.13708333
## 7                                                              Bicycling      Yes   728 0.08519602
## 8  Boating (Canoeing, rowing, kayaking, sailing for pleasure or camping)      Yes    22 0.11518325
## 9                                                                Bowling      Yes    68 0.09985316
## 10                                                               Boxing       Yes     5 0.01633987
## # ... with 59 more rows

The proportion of "Yes" values is pretty small, as you can see from the first table.