Lloyd Christmas Lloyd Christmas - 2 months ago 19
R Question

perform operations on a data frame based on a factors

I'm having a hard time to describe this so it's best explained with an example (as can probably be seen from the poor question title).

Using dplyr I have the result of a

group_by
and
summarize
I have a data frame that I want to do some further manipulation on by factor.

As an example, here's a data frame that looks like the result of my dplyr operations:

> df <- data.frame(run=as.factor(c(rep(1,3), rep(2,3))),
group=as.factor(rep(c("a","b","c"),2)),
sum=c(1,8,34,2,7,33))
> df
run group sum
1 1 a 1
2 1 b 8
3 1 c 34
4 2 a 2
5 2 b 7
6 2 c 33


I want to divide
sum
by a value that depends on
run
. For example, if I have:

> total <- data.frame(run=as.factor(c(1,2)),
total=c(45,47))
> total
run total
1 1 45
2 2 47


Then my final data frame will look like this:

> df
run group sum percent
1 1 a 1 1/45
2 1 b 8 8/45
3 1 c 34 34/45
4 2 a 2 2/47
5 2 b 7 7/47
6 2 c 33 33/47


Where I manually inserted the fraction in the
percent
column by hand to show the operation I want to do.

I know there is probably some dplyr way to do this with
mutate
but I can't seem to figure it out right now. How would this be accomplished?

Answer

(In base R)

You can use total as a look-up table where you get a total for each run of df :

total[df$run,'total']
[1] 45 45 45 47 47 47

And you simply use it to divide the sum and assign the result to a new column:

df$percent <- df$sum / total[df$run,'total']

  run group sum    percent
1   1     a   1 0.02222222
2   1     b   8 0.17777778
3   1     c  34 0.75555556
4   2     a   2 0.04255319
5   2     b   7 0.14893617
6   2     c  33 0.70212766