user673592 user673592 - 3 months ago 11
R Question

Normalize all rows with first element within group

Is there an elegant method to normalize a column with a group-specific norm with dplyr?

Example:
I have a data frame:

df = data.frame(year=c(1:2, 1:2),
group=c("a", "a", "b", "b"),
val=c(100, 200, 300, 900))


i.e:

year group val
1 1 a 100
2 2 a 200
3 1 b 300
4 2 b 900


I want to normalize
val
by the value in year=1 of the given group. Desired output:

year group val val_norm
1 1 a 100 1
2 2 a 200 2
3 1 b 300 1
4 2 b 900 3


e.g. in row 4 the norm = 300 (year==1 & group=="b") hence val_norm = 900/300 = 3.

I can achieve this by extracting a ancillary data frame with just norms and then doing a left join on the original data frame.

What is a more elegant way to achieve this without creating a temporary data frame?

Answer

We can group by 'group', then divide the 'val' by the 'val' where 'year' is 1 (year==1). Here, I am selecting the first observation (in case there are duplicate 'year' of 1 for each 'group').

library(dplyr)
df %>% 
    group_by(group) %>% 
    mutate(val_norm = val/val[year==1][1L])
#      year  group   val val_norm
#  <int> <fctr> <dbl>    <dbl>
#1     1      a   100        1
#2     2      a   200        2
#3     1      b   300        1
#4     2      b   900        3

If we need elegance and efficiency, data.table can be tried

library(data.table)
setDT(df)[, val_norm := val/val[year==1][1L] , by = group]
Comments