robertevansanders robertevansanders - 3 months ago 13
R Question

tabulating groups by row and column in R, WITHOUT looping

Please feel free to edit the title if there is a more clear one. Here is what I am trying to do.

Without looping (because the dimensions of the matrix will be very large and loops are too slow.)

Given this matrix, A

A = rbind(
c(2, 2, 1, 1, 2, 2, 2 ),
c(2, 2, 3, 2, 2, 3, 2 ),
c(1, 1, 1, 2, 2, 1, 1 ),
c(1, 1, 1, 1, 1, 1, 1 ),
c(1, 2, 1, 1, 1, 1, 1 ),
c(2, 2, 1, 1, 1, 1, 2 ),
'S'=c(1, 2, 3, 4, 5, 6, 7))

A = cbind(A,'R'=c(rep(1:2,3),NA))

R
2 2 1 1 2 2 2 1
2 2 3 2 2 3 2 2
1 1 1 2 2 1 1 1
1 1 1 1 1 1 1 2
1 2 1 1 1 1 1 1
2 2 1 1 1 1 2 2
S 1 2 3 4 5 6 7 NA


We have the bottom row called “S” and the last column called “R”

I want to group by R and S and basically return a TABLE () function of each group, for example,

S=1, R =1

R
**2** 1

**1** 1

**1** 1

S 1


The tricky part is when not all values show up, I still need it to tabulate 0 for the third. (So perhaps I need some workaround)...
Basically, the table would return:

1 2 3
2 1 0

I would like the resulting thing to be

(R * S) x 3


And the output would be:

2 1 0
1 2 0
3 0 0
2 1 0
1 2 0
2 1 0
2 1 0
1 2 0
1 2 0
2 0 1
2 1 0
2 1 0
2 0 1
1 2 0

Answer

Subset the 'A' matrix by removing the last column and last row to create the 'A1', change the rownames and column names with the last column/row, melt to 'long' format, convert to data.table, dcast it to 'wide' (after getting the number of rows or just use the fun.aggregate as length in dcast)

library(reshape2)
library(data.table)
A1 <- A[-nrow(A), -ncol(A)]
row.names(A1) <- A[-nrow(A),"R"]
colnames(A1) <- A["S", - ncol(A)]
dcast(setDT(melt(A1))[,.N ,.(RowCol = paste(Var2, Var1, sep="_"), value)], 
              RowCol~value,value.var="N", fill = 0)
#    RowCol 1 2 3
# 1:    1_1 2 1 0
# 2:    1_2 1 2 0
# 3:    2_1 1 2 0
# 4:    2_2 1 2 0
# 5:    3_1 3 0 0
# 6:    3_2 2 0 1
# 7:    4_1 2 1 0
# 8:    4_2 2 1 0
# 9:    5_1 1 2 0
#10:    5_2 2 1 0
#11:    6_1 2 1 0
#12:    6_2 2 0 1
#13:    7_1 2 1 0
#14:    7_2 1 2 0

Or if we don't need to paste the row/column names

dcast(setDT(melt(t(A1))), Var1+Var2~value, length)

Or we can use table from base R after converting to 'long' format with rep

table(data.frame(RowCol = paste(rep(colnames(A1), nrow(A1)), 
             rep(rownames(A1), ncol(A1)), sep = "_"), c(A1)))
Comments