Adela Adela - 3 months ago 20
R Question

contingency tables from data.frame columns

I'm trying to create 4-way contingency table from my data set.
My data set looks like this:

a <- c(1, 0, 0, 1, 1, 1, 0, 1, 0, 1, 0, 1)
b <- c(1, 1, 0, 0, 0, 1, 1, 1, 1, 0, 0, 1)
group1 <- sample(letters[25:26], 12, replace = T)
group2 <- sample(letters[7:10], 12, replace = T)

df <- data.frame(a, b, group1, group2)


I tried with
aggregate
function. Everything is OK when creating 3-way contingency table

aggregate(cbind(a, b) ~ group1, data = df, FUN = table)
group1 a.0 a.1 b.0 b.1
1 y 3 4 3 4
2 z 2 3 2 3


However, when adding second grouping variable, the output is confusing and not desired.

aggregate(. ~ group1 + group2, data = df, FUN = table)
group1 group2 a b
1 y g 3 3
2 z g 1 1
3 z h 1 1
4 y i 1 1
5 z i 1 1
6 y j 2, 1 3
7 z j 1, 1 1, 1


As my original data set is quite large, I would appreciate some nice elegant and automatic approach to deal with it. Thanks for any suggestions!

Answer

It is not clear about the expected output. Perhaps we need melt/dcast

library(data.table)
dcast(melt(setDT(df), id.var = c("group1", "group2")), 
                       group1 + group2 ~variable + value, length)

Or use the recast (wrapper for melt/dcast from reshape2)

library(reshape2)
recast(df, measure.var = c("a", "b"), ... ~ variable + value, length)
#    group1 group2 a_0 a_1 b_0 b_1
#1      y      g   1   4   3   2
#2      y      h   1   0   1   0
#3      y      j   1   1   0   2
#4      z      g   2   0   0   2
#5      z      i   0   1   0   1
#6      z      j   0   1   1   0

The OP's aggregate give this output

aggregate(. ~ group1 + group2, data = df, FUN = table)
#  group1 group2    a    b
#1      y      g 1, 4 3, 2
#2      z      g    2    2
#3      y      h    1    1
#4      z      i    1    1
#5      y      j 1, 1    2
#6      z      j    1    1

If we want aggregate to get both the levels, then convert to a factor with levels specified and do the table

do.call(data.frame, aggregate(cbind(a, b) ~ group1 + group2, data = df, 
              FUN = function(x) table(factor(x, levels = 0:1))))
#  group1 group2 a.0 a.1 b.0 b.1
#1      y      g   1   4   3   2
#2      z      g   2   0   0   2
#3      y      h   1   0   1   0
#4      z      i   0   1   0   1
#5      y      j   1   1   0   2
#6      z      j   0   1   1   0

If we want all the combinations, there is drop = FALSE in dcast

dcast(melt(setDT(df), id.var = c("group1", "group2")), group1 + group2 ~
                   variable + value, length, drop = FALSE)

Or in recast

recast(df, measure.var = c("a", "b"), ... ~ variable + value, length, drop = FALSE) 

NOTE: There was no set.seed for sample, so the output showed here will be different from the OP's output

Comments