Noobie - 1 year ago 87
R Question

# dplyr : how to get two-way tables with marginal proportions?

Consider this

``````var1<- c('A','B','C','C','G')
var2<- c('G','B','B','F','A')
df = df=data.frame(var1,var2)

> df
var1 var2
1    A    G
2    B    B
3    C    B
4    C    F
5    G    A
``````

I want to get a two-way table between
`var1`
and
`var2`
.

I can use

``````> df %>% count(var1,var2) %>% spread(var2, n, fill = 0)
Source: local data frame [4 x 5]
Groups: var1 [4]

var1     A     B     F     G
* <fctr> <dbl> <dbl> <dbl> <dbl>
1      A     0     0     0     1
2      B     0     1     0     0
3      C     0     1     1     0
4      G     1     0     0     0
``````

or using
`table`

``````> table( df\$var1, df\$var2)

A B F G
A 0 0 0 1
B 0 1 0 0
C 0 1 1 0
G 1 0 0 0
``````

Both give the same results. But now suppose I am interested in the row or columns proportions. I can use
`table`
again here

``````> prop.table(table( df\$var1, df\$var2), margin =2) # for column

A   B   F   G
A 0.0 0.0 0.0 1.0
B 0.0 0.5 0.0 0.0
C 0.0 0.5 1.0 0.0
G 1.0 0.0 0.0 0.0

> prop.table(table( df\$var1, df\$var2), margin =1) # for rows

A   B   F   G
A 0.0 0.0 0.0 1.0
B 0.0 1.0 0.0 0.0
C 0.0 0.5 0.5 0.0
G 1.0 0.0 0.0 0.0
``````

How can I reproduce these with
`dplyr`
(the shortest syntax)?

The solution below seems to work (only in one case), although I don't understand why: what does the
`prop.table(n)`
function does here?

``````> df %>%  count(var1,var2) %>% mutate(prop = prop.table(n)) %>% spread(var2, prop, fill = 0)
Source: local data frame [4 x 6]
Groups: var1 [4]

var1     n     A     B     F     G
* <fctr> <int> <dbl> <dbl> <dbl> <dbl>
1      A     1     0   0.0   0.0     1
2      B     1     0   1.0   0.0     0
3      C     1     0   0.5   0.5     0
4      G     1     1   0.0   0.0     0
``````

Also, using
`margin`
in prop.table fails here.

``````> df %>%  count(var1,var2) %>% mutate(prop = prop.table(n ,margin =1)) %>% spread(var2, prop, fill = 0)
Error in eval(substitute(expr), envir, enclos) : 'x' is not an array
``````

Thanks!

You can use `group_by` to mimic the desired margin, then divide each count by the sum of counts in the group (note, remove column `n` to avoid problems when `spread`ing):

``````df %>%
count(var1,var2) %>%
group_by(var2) %>%
mutate(prop = n / sum(n)) %>%
select(-n) %>%
``````    var1     A     B     F     G