wwl - 1 year ago 73
R Question

Creating a new column based on function

I have an Excel sheet containing employment numbers of each industry of each county in the US.

It looks like this:

``````County   Industry  Employees
a        1         49
a        2         1
b        1         4
b        2         19
...
``````

I want to calculate the Herfindahl-Hirschman index (HHI) of employment in each county. I'm using R.
Given some numbers, calculating the HHI is easy:

``````hhi <- function(x) {
# calculate sum
total <- sum(x)

# calculate share
share <- x*100/total

return(sum(share^2))

}
``````

So, for example, county 1 has a HHI of 9608 (= 98^2 + 2^2) and county 2 has a HHI of 7127.

But how can I create a new column with the HHI of that county?

You can use `dplyr`:

``````library(dplyr)
df %>% group_by(County) %>% mutate(HHI = sum((Employees/sum(Employees) * 100)^2))

# Source: local data frame [4 x 4]
# Groups: County [2]

#   County Industry Employees      HHI
#   <fctr>    <int>     <int>    <dbl>
# 1      a        1        50 9615.532
# 2      a        2         1 9615.532
# 3      b        1         4 7126.654
# 4      b        2        19 7126.654
``````

Or equivalently, use `data.table`:

``````setDT(df)[, HHI := sum((Employees/sum(Employees) * 100)^2), County][]
``````

With your own customized function `hhi`, since all the functions it calls are vectorized, you can directly use it with `mutate`:

``````df %>% group_by(County) %>% mutate(HHI = hhi(Employees))
``````

or:

``````setDT(df)[, HHI := hhi(Employees), County][]
``````
Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download