MW Frost - 1 year ago 167
R Question

# Calculating percent of row total with plyr

I am currently using

`cast`
on a melted table to calculate the total of each value at the combination of ID variables ID1 (row names) and ID2 (column headers), along with grand totals for each row using
`margins="grand_col"`
.

`c <- cast(m, ID1 ~ ID2, sum, margins="grand_col")`

``````  ID1      ID2a  ID2b     ID2c     ID2d   ID2e    (all)
1  ID1a  6459695  885473  648019  453613 1777308 10224108
2  ID1b  7263529 1411355  587785  612730 2458672 12334071
3  ID1c  7740364 1253524  682977  886897 3559283 14123045
``````

So far, so R-like.

Then I divide each cell by its row total to get a percentage of the total.

``````c[,2:6]<-c[,2:6] / c[,7]
``````

This looks kludgy. Is there something I should be doing in
`cast`
or maybe in
`plyr`
to handle the percent of margin calculation in the first command?

Thanks,
Matt

Assuming your source table looks something like this:

``````dfm <- structure(list(ID1 = structure(c(1L, 2L, 3L, 1L, 2L, 3L, 1L,
2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L), .Label = c("ID1a", "ID1b", "ID1c"
), class = "factor"), ID2 = structure(c(1L, 1L, 1L, 2L,
2L, 2L, 3L, 3L, 3L, 4L, 4L, 4L, 5L, 5L, 5L), .Label = c("ID2a",
"ID2b", "ID2c", "ID2d", "ID2e"), class = "factor"), value = c(6459695L,
7263529L, 7740364L, 885473L, 1411355L, 1253524L, 648019L, 587785L,
682977L, 453613L, 612730L, 886897L, 1777308L, 2458672L, 3559283L
)), .Names = c("ID1", "ID2", "value"), row.names = c(NA,
-15L), class = "data.frame")

ID1  ID2   value
1 ID1a ID2a 6459695
2 ID1b ID2a 7263529
3 ID1c ID2a 7740364
4 ID1a ID2b  885473
5 ID1b ID2b 1411355
6 ID1c ID2b 1253524
``````

Using `ddply` first to calculate the percentages, and `cast` to present the data in the required format

``````library(reshape)
library(plyr)

df1 <- ddply(dfm, .(ID1), summarise, ID2 = ID2, pct = value / sum(value))
dfc <- cast(df1, ID1 ~ ID2)

dfc
ID1      ID2a       ID2b       ID2c       ID2d      ID2e
1 ID1a 0.6318101 0.08660638 0.06338147 0.04436700 0.1738350
2 ID1b 0.5888996 0.11442735 0.04765539 0.04967784 0.1993399
3 ID1c 0.5480662 0.08875735 0.04835905 0.06279786 0.2520195
``````

Compared to your example, this is missing the row totals, these need to be added separately.

Not sure though, whether this solution is more elegant than the one you currently have.

Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download