have fun - 1 year ago 119
R Question

# convert data frame of counts to proportions by conditions in R

I would need to expand on this question: convert data frame of counts to proportions in R

I need to calculate proportion by one condition and retain the information of the dataset.

Reproducible example:

``````ID <- rep(c(1,2,3), each=3)
trial <- rep("a", 9)
variable1 <- sample(1:10, 9)
variable2 <- sample(1:10, 9)
variable3 <- sample(1:10, 9)
condition <- rep(c("i","j","k"), 3)
dat <- data.frame(cbind(ID, trial,variable1,variable2,variable3,condition))
``````

For each variable I would like to have the proportion by the ID (i.e. 3 times)
Ideally the new variables would be stored in the same database as
`dat\$variable1_p`

I know how to do the trick by a series of for loops but I would like to learn how to use the apply function. Also to be able to expand it to more conditions if necessary.

We can use `adply` from the `plyr` package:

``````library(plyr)
c('variable1_p' = x\$variable1 / sum(dat[x\$ID == dat\$ID,]\$variable1)))

#   ID trial variable1 variable2 variable3 condition variable1_p
# 1  1     a         3         5         4         i  0.20000000
# 2  1     a         8         9         9         j  0.53333333
# 3  1     a         4         4         8         k  0.26666667
# 4  2     a         7        10         5         i  0.50000000
# 5  2     a         6         8        10         j  0.42857143
# 6  2     a         1         1         7         k  0.07142857
# 7  3     a        10         6         3         i  0.47619048
# 8  3     a         9         7         6         j  0.42857143
# 9  3     a         2         3         2         k  0.09523810
``````

Another option is to use `dplyr`, which would handle cases where there is more than one row per condition per ID:

``````library(dplyr)
dat %>%
group_by(ID, condition) %>%
mutate(sum_v1_cond = sum(variable1)) %>%
ungroup() %>%
group_by(ID) %>%
mutate(variable1_p = sum_v1_cond / sum(variable1)) %>%
select(-sum_v1_cond)
``````

### Edit - here's a full solution for `variable1`, `variable2`, and `variable3`:

``````adply(dat, 1, function(x)
c('variable1_p' = x\$variable1 / sum(dat[x\$ID == dat\$ID,]\$variable1),
'variable2_p' = x\$variable2 / sum(dat[x\$ID == dat\$ID,]\$variable2),
'variable3_p' = x\$variable3 / sum(dat[x\$ID == dat\$ID,]\$variable3)))
``````

### Data:

``````set.seed(123)
ID <- rep(c(1,2,3), each=3)
trial <- rep("a", 9)
variable1 <- sample(1:10, 9)
variable2 <- sample(1:10, 9)
variable3 <- sample(1:10, 9)
condition <- rep(c("i","j","k"), 3)
dat <- data.frame(ID, trial,variable1,variable2,variable3,condition,
stringsAsFactors = FALSE)
``````
Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download