Noobie Noobie - 1 month ago 10
R Question

dplyr : how to get two-way tables with marginal proportions?

Consider this

var1<- c('A','B','C','C','G')
var2<- c('G','B','B','F','A')
df = df=data.frame(var1,var2)

> df
var1 var2
1 A G
2 B B
3 C B
4 C F
5 G A


I want to get a two-way table between
var1
and
var2
.

I can use

> df %>% count(var1,var2) %>% spread(var2, n, fill = 0)
Source: local data frame [4 x 5]
Groups: var1 [4]

var1 A B F G
* <fctr> <dbl> <dbl> <dbl> <dbl>
1 A 0 0 0 1
2 B 0 1 0 0
3 C 0 1 1 0
4 G 1 0 0 0


or using
table


> table( df$var1, df$var2)

A B F G
A 0 0 0 1
B 0 1 0 0
C 0 1 1 0
G 1 0 0 0


Both give the same results. But now suppose I am interested in the row or columns proportions. I can use
table
again here

> prop.table(table( df$var1, df$var2), margin =2) # for column

A B F G
A 0.0 0.0 0.0 1.0
B 0.0 0.5 0.0 0.0
C 0.0 0.5 1.0 0.0
G 1.0 0.0 0.0 0.0

> prop.table(table( df$var1, df$var2), margin =1) # for rows

A B F G
A 0.0 0.0 0.0 1.0
B 0.0 1.0 0.0 0.0
C 0.0 0.5 0.5 0.0
G 1.0 0.0 0.0 0.0


How can I reproduce these with
dplyr
(the shortest syntax)?

The solution below seems to work (only in one case), although I don't understand why: what does the
prop.table(n)
function does here?

> df %>% count(var1,var2) %>% mutate(prop = prop.table(n)) %>% spread(var2, prop, fill = 0)
Source: local data frame [4 x 6]
Groups: var1 [4]

var1 n A B F G
* <fctr> <int> <dbl> <dbl> <dbl> <dbl>
1 A 1 0 0.0 0.0 1
2 B 1 0 1.0 0.0 0
3 C 1 0 0.5 0.5 0
4 G 1 1 0.0 0.0 0


Also, using
margin
in prop.table fails here.

> df %>% count(var1,var2) %>% mutate(prop = prop.table(n ,margin =1)) %>% spread(var2, prop, fill = 0)
Error in eval(substitute(expr), envir, enclos) : 'x' is not an array


Thanks!

Answer

You can use group_by to mimic the desired margin, then divide each count by the sum of counts in the group (note, remove column n to avoid problems when spreading):

df %>%
  count(var1,var2) %>%
  group_by(var2) %>%
  mutate(prop = n / sum(n)) %>%
  select(-n) %>%
  spread(var2, prop, fill = 0)

gives

    var1     A     B     F     G
1      A     0   0.0     0     1
2      B     0   0.5     0     0
3      C     0   0.5     1     0
4      G     1   0.0     0     0