wwl wwl - 2 months ago 14
R Question

Creating a new column based on function

I have an Excel sheet containing employment numbers of each industry of each county in the US.

It looks like this:

County Industry Employees
a 1 49
a 2 1
b 1 4
b 2 19
...


I want to calculate the Herfindahl-Hirschman index (HHI) of employment in each county. I'm using R.
Given some numbers, calculating the HHI is easy:

hhi <- function(x) {
# calculate sum
total <- sum(x)

# calculate share
share <- x*100/total

# add
return(sum(share^2))

}


So, for example, county 1 has a HHI of 9608 (= 98^2 + 2^2) and county 2 has a HHI of 7127.

But how can I create a new column with the HHI of that county?

Answer

You can use dplyr:

library(dplyr)
df %>% group_by(County) %>% mutate(HHI = sum((Employees/sum(Employees) * 100)^2))

# Source: local data frame [4 x 4]
# Groups: County [2]

#   County Industry Employees      HHI
#   <fctr>    <int>     <int>    <dbl>
# 1      a        1        50 9615.532
# 2      a        2         1 9615.532
# 3      b        1         4 7126.654
# 4      b        2        19 7126.654

Or equivalently, use data.table:

setDT(df)[, HHI := sum((Employees/sum(Employees) * 100)^2), County][]

With your own customized function hhi, since all the functions it calls are vectorized, you can directly use it with mutate:

df %>% group_by(County) %>% mutate(HHI = hhi(Employees))

or:

setDT(df)[, HHI := hhi(Employees), County][]
Comments