Canovice Canovice - 3 months ago 9
R Question

R - calculating the average value of a dataframe column from the top row to bottom row

The title may not be that clear, since it was difficult to summarize the problem in a few words, although I don't think the problem is that difficult to solve. To explain the problem, let me share a dataframe for reference:

head(df, n = 10)

team score
1 A 10
2 A 4
3 A 10
4 A 16
5 A 20
6 B 5
7 B 11
8 B 8
9 B 16
10 B 5


I'd like to add a third column, that calculates the average score for each team, with the average score updating as I go down the rows for each team, and then resetting at a new team. For example, the output column I am hoping for would look like this:

head(df, n = 10)

team score avg_score
1 A 10 10
2 A 4 7
3 A 10 8
4 A 16 10
5 A 20 12
6 B 5 5
7 B 11 8
8 B 8 8
9 B 16 10
10 B 5 9


# row1: 10 = 10
# row2: 7 = (10 + 4)/2
# row3: 8 = (10 + 4 + 10)/3
# ...


with the pattern following, and the calculation restarting for a new team.

Thanks,

Answer
library("data.table")
setDT(df)[, `:=` (avg_score = cumsum(score)/1:.N), by = team]

or more readable as per the comment by @snoram

setDT(dt)[, avg_score := cumsum(score)/(1:.N), by = team]

#    team score avg_score
# 1:    A    10        10
# 2:    A     4         7
# 3:    A    10         8
# 4:    A    16        10
# 5:    A    20        12
# 6:    B     5         5
# 7:    B    11         8
# 8:    B     8         8
# 9:    B    16        10
# 10:    B     5         9
Comments