user673592 - 1 year ago 74
R Question

# Normalize all rows with first element within group

Is there an elegant method to normalize a column with a group-specific norm with dplyr?

Example:
I have a data frame:

``````df = data.frame(year=c(1:2, 1:2),
group=c("a", "a", "b", "b"),
val=c(100, 200, 300, 900))
``````

i.e:

``````  year group val
1    1     a 100
2    2     a 200
3    1     b 300
4    2     b 900
``````

I want to normalize
`val`
by the value in year=1 of the given group. Desired output:

``````   year group val val_norm
1    1     a 100        1
2    2     a 200        2
3    1     b 300        1
4    2     b 900        3
``````

e.g. in row 4 the norm = 300 (year==1 & group=="b") hence val_norm = 900/300 = 3.

I can achieve this by extracting a ancillary data frame with just norms and then doing a left join on the original data frame.

What is a more elegant way to achieve this without creating a temporary data frame?

We can group by 'group', then divide the 'val' by the 'val' where 'year' is 1 (`year==1`). Here, I am selecting the first observation (in case there are duplicate 'year' of 1 for each 'group').

``````library(dplyr)
df %>%
group_by(group) %>%
mutate(val_norm = val/val[year==1][1L])
#      year  group   val val_norm
#  <int> <fctr> <dbl>    <dbl>
#1     1      a   100        1
#2     2      a   200        2
#3     1      b   300        1
#4     2      b   900        3
``````

If we need elegance and efficiency, `data.table` can be tried

``````library(data.table)
setDT(df)[, val_norm := val/val[year==1][1L] , by = group]
``````
Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download