have fun have fun - 3 months ago 14
R Question

convert data frame of counts to proportions by conditions in R

I would need to expand on this question: convert data frame of counts to proportions in R

I need to calculate proportion by one condition and retain the information of the dataset.

Reproducible example:

ID <- rep(c(1,2,3), each=3)
trial <- rep("a", 9)
variable1 <- sample(1:10, 9)
variable2 <- sample(1:10, 9)
variable3 <- sample(1:10, 9)
condition <- rep(c("i","j","k"), 3)
dat <- data.frame(cbind(ID, trial,variable1,variable2,variable3,condition))


For each variable I would like to have the proportion by the ID (i.e. 3 times)
Ideally the new variables would be stored in the same database as
dat$variable1_p


I know how to do the trick by a series of for loops but I would like to learn how to use the apply function. Also to be able to expand it to more conditions if necessary.

Answer

We can use adply from the plyr package:

library(plyr)
adply(dat, 1, function(x)
    c('variable1_p' = x$variable1 / sum(dat[x$ID == dat$ID,]$variable1)))

#   ID trial variable1 variable2 variable3 condition variable1_p
# 1  1     a         3         5         4         i  0.20000000
# 2  1     a         8         9         9         j  0.53333333
# 3  1     a         4         4         8         k  0.26666667
# 4  2     a         7        10         5         i  0.50000000
# 5  2     a         6         8        10         j  0.42857143
# 6  2     a         1         1         7         k  0.07142857
# 7  3     a        10         6         3         i  0.47619048
# 8  3     a         9         7         6         j  0.42857143
# 9  3     a         2         3         2         k  0.09523810

Another option is to use dplyr, which would handle cases where there is more than one row per condition per ID:

library(dplyr)
dat %>%
    group_by(ID, condition) %>%
    mutate(sum_v1_cond = sum(variable1)) %>%
    ungroup() %>%
    group_by(ID) %>%
    mutate(variable1_p = sum_v1_cond / sum(variable1)) %>%
    select(-sum_v1_cond)

Edit - here's a full solution for variable1, variable2, and variable3:

adply(dat, 1, function(x)
    c('variable1_p' = x$variable1 / sum(dat[x$ID == dat$ID,]$variable1),
      'variable2_p' = x$variable2 / sum(dat[x$ID == dat$ID,]$variable2),
      'variable3_p' = x$variable3 / sum(dat[x$ID == dat$ID,]$variable3)))

Data:

set.seed(123)
ID <- rep(c(1,2,3), each=3)
trial <- rep("a", 9)
variable1 <- sample(1:10, 9)
variable2 <- sample(1:10, 9)
variable3 <- sample(1:10, 9)
condition <- rep(c("i","j","k"), 3)
dat <- data.frame(ID, trial,variable1,variable2,variable3,condition,
                  stringsAsFactors = FALSE)
Comments