have fun - 5 months ago 31

R Question

I would need to expand on this question: convert data frame of counts to proportions in R

I need to calculate proportion by one condition and retain the information of the dataset.

Reproducible example:

`ID <- rep(c(1,2,3), each=3)`

trial <- rep("a", 9)

variable1 <- sample(1:10, 9)

variable2 <- sample(1:10, 9)

variable3 <- sample(1:10, 9)

condition <- rep(c("i","j","k"), 3)

dat <- data.frame(cbind(ID, trial,variable1,variable2,variable3,condition))

For each variable I would like to have the proportion by the ID (i.e. 3 times)

Ideally the new variables would be stored in the same database as

`dat$variable1_p`

I know how to do the trick by a series of for loops but I would like to learn how to use the apply function. Also to be able to expand it to more conditions if necessary.

Answer

We can use `adply`

from the `plyr`

package:

```
library(plyr)
adply(dat, 1, function(x)
c('variable1_p' = x$variable1 / sum(dat[x$ID == dat$ID,]$variable1)))
# ID trial variable1 variable2 variable3 condition variable1_p
# 1 1 a 3 5 4 i 0.20000000
# 2 1 a 8 9 9 j 0.53333333
# 3 1 a 4 4 8 k 0.26666667
# 4 2 a 7 10 5 i 0.50000000
# 5 2 a 6 8 10 j 0.42857143
# 6 2 a 1 1 7 k 0.07142857
# 7 3 a 10 6 3 i 0.47619048
# 8 3 a 9 7 6 j 0.42857143
# 9 3 a 2 3 2 k 0.09523810
```

Another option is to use `dplyr`

, which would handle cases where there is more than one row per condition per ID:

```
library(dplyr)
dat %>%
group_by(ID, condition) %>%
mutate(sum_v1_cond = sum(variable1)) %>%
ungroup() %>%
group_by(ID) %>%
mutate(variable1_p = sum_v1_cond / sum(variable1)) %>%
select(-sum_v1_cond)
```

`variable1`

, `variable2`

, and `variable3`

:```
adply(dat, 1, function(x)
c('variable1_p' = x$variable1 / sum(dat[x$ID == dat$ID,]$variable1),
'variable2_p' = x$variable2 / sum(dat[x$ID == dat$ID,]$variable2),
'variable3_p' = x$variable3 / sum(dat[x$ID == dat$ID,]$variable3)))
```

```
set.seed(123)
ID <- rep(c(1,2,3), each=3)
trial <- rep("a", 9)
variable1 <- sample(1:10, 9)
variable2 <- sample(1:10, 9)
variable3 <- sample(1:10, 9)
condition <- rep(c("i","j","k"), 3)
dat <- data.frame(ID, trial,variable1,variable2,variable3,condition,
stringsAsFactors = FALSE)
```