MW Frost MW Frost - 2 months ago 14
R Question

Calculating percent of row total with plyr

I am currently using

cast
on a melted table to calculate the total of each value at the combination of ID variables ID1 (row names) and ID2 (column headers), along with grand totals for each row using
margins="grand_col"
.

c <- cast(m, ID1 ~ ID2, sum, margins="grand_col")


ID1 ID2a ID2b ID2c ID2d ID2e (all)
1 ID1a 6459695 885473 648019 453613 1777308 10224108
2 ID1b 7263529 1411355 587785 612730 2458672 12334071
3 ID1c 7740364 1253524 682977 886897 3559283 14123045


So far, so R-like.

Then I divide each cell by its row total to get a percentage of the total.

c[,2:6]<-c[,2:6] / c[,7]


This looks kludgy. Is there something I should be doing in
cast
or maybe in
plyr
to handle the percent of margin calculation in the first command?

Thanks,
Matt

Answer

Assuming your source table looks something like this:

dfm <- structure(list(ID1 = structure(c(1L, 2L, 3L, 1L, 2L, 3L, 1L, 
2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L), .Label = c("ID1a", "ID1b", "ID1c"
), class = "factor"), ID2 = structure(c(1L, 1L, 1L, 2L, 
2L, 2L, 3L, 3L, 3L, 4L, 4L, 4L, 5L, 5L, 5L), .Label = c("ID2a", 
"ID2b", "ID2c", "ID2d", "ID2e"), class = "factor"), value = c(6459695L, 
7263529L, 7740364L, 885473L, 1411355L, 1253524L, 648019L, 587785L, 
682977L, 453613L, 612730L, 886897L, 1777308L, 2458672L, 3559283L
)), .Names = c("ID1", "ID2", "value"), row.names = c(NA, 
-15L), class = "data.frame")

> head(dfm)
   ID1  ID2   value
1 ID1a ID2a 6459695
2 ID1b ID2a 7263529
3 ID1c ID2a 7740364
4 ID1a ID2b  885473
5 ID1b ID2b 1411355
6 ID1c ID2b 1253524

Using ddply first to calculate the percentages, and cast to present the data in the required format

library(reshape)
library(plyr)

df1 <- ddply(dfm, .(ID1), summarise, ID2 = ID2, pct = value / sum(value))
dfc <- cast(df1, ID1 ~ ID2)

dfc
   ID1      ID2a       ID2b       ID2c       ID2d      ID2e
1 ID1a 0.6318101 0.08660638 0.06338147 0.04436700 0.1738350
2 ID1b 0.5888996 0.11442735 0.04765539 0.04967784 0.1993399
3 ID1c 0.5480662 0.08875735 0.04835905 0.06279786 0.2520195

Compared to your example, this is missing the row totals, these need to be added separately.

Not sure though, whether this solution is more elegant than the one you currently have.

Comments