JamAndJammies JamAndJammies - 2 months ago 4x
R Question

Efficient way to calculate percentage change based on criteria

What is the most efficient and quickest way to calculate percentage change based on certain criteria? The only way I can think of is using

function then loop through each name but I don't think it is efficient.

Dataframe 1
2015 Name
538 Johnny Appleseed
14 Johnny Appleseed
40 Johnny Appleseed
524 Johnny Appleseed
10 Sally Smith
50 Jenny Davids (There's no Jenny in 2016)

Dataframe 2
2016 Name
203 Johnny Appleseed
100 Sally Smith
50 Sally Smith
10 Sally Smith
242 Sally Smith
150 Johnny Appleseed
50 Sally Smith

Expected output example

  • Johnny Appleseed -68.369%

  • Sally Smith 442%

  • Jenny Davids N/A


There are a few ways to accomplish what you're after. My preferred way would be to combine the 2015 and 2016 scores into a single "long" data set. This makes it easier to aggregate and transform data across different categories without loops.

First let's rename the existing columns, since numeric column names can give you problems down the line. We'll also add a "year" column to each data set.

colnames(df1) <- c('value', 'name')
colnames(df2) <- c('value', 'name')

df1$year <- 2015
df2$year <- 2016

These can now be combined into a single data set:

df3 <- rbind(df1, df2)

And finally, dplyr to the rescue:

df.change <- group_by(df3, name, year) %>%
             summarize(value = mean(value)) %>%
             ungroup %>% group_by(name) %>%
             summarize(change = (value[year == 2016] - value[year == 2015]) / value[year == 2015])

This should produce a data frame with columns for name and percent change.