JamAndJammies - 2 months ago 4x
R Question

Efficient way to calculate percentage change based on criteria

What is the most efficient and quickest way to calculate percentage change based on certain criteria? The only way I can think of is using

`aggregate`
function then loop through each name but I don't think it is efficient.

``````Dataframe 1
2015    Name
538 Johnny Appleseed
14  Johnny Appleseed
40  Johnny Appleseed
524 Johnny Appleseed
10  Sally Smith
50  Jenny Davids (There's no Jenny in 2016)

Dataframe 2
2016    Name
203 Johnny Appleseed
100 Sally Smith
50  Sally Smith
10  Sally Smith
242 Sally Smith
150 Johnny Appleseed
50  Sally Smith
``````

[[Update]]
Expected output example

• Johnny Appleseed -68.369%

• Sally Smith 442%

• Jenny Davids N/A

There are a few ways to accomplish what you're after. My preferred way would be to combine the 2015 and 2016 scores into a single "long" data set. This makes it easier to aggregate and transform data across different categories without loops.

First let's rename the existing columns, since numeric column names can give you problems down the line. We'll also add a "year" column to each data set.

``````colnames(df1) <- c('value', 'name')
colnames(df2) <- c('value', 'name')

df1\$year <- 2015
df2\$year <- 2016
``````

These can now be combined into a single data set:

``````df3 <- rbind(df1, df2)
``````

And finally, `dplyr` to the rescue:

``````library(dplyr)
df.change <- group_by(df3, name, year) %>%
summarize(value = mean(value)) %>%
ungroup %>% group_by(name) %>%
summarize(change = (value[year == 2016] - value[year == 2015]) / value[year == 2015])
``````

This should produce a data frame with columns for name and percent change.