EdM - 24 days ago 5x
R Question

# Cumulative Sum of a division with varying denominators R

Ok, here is the problem that I would love to solve using an efficient, elegant solution such as data.table or dplyr.

Define:

``````DT = data.table(group=c(rep("A",3),rep("B",5)),value=c(2,9,2,3,4,1,0,3))

time group value
1:    1     A     2
2:    2     A     9
3:    3     A     2
4:    1     B     3
5:    2     B     4
6:    3     B     1
7:    4     B     0
8:    5     B     3
``````

What I'm trying to get is a cumulative sum by group of the values divided by their inverse ordering at the moment of time they've been observed.

``````   time group value    RESULT
1:    1     A     2  2.000000
2:    2     A     9 10.000000
3:    3     A     2  7.166667
4:    1     B     3  3.000000
5:    2     B     4  5.500000
6:    3     B     1  4.000000
7:    4     B     0  2.583333
8:    5     B     3  4.933333
``````

At line 5 the result is:
`4/1 + 3/2 = 5.5`

Because at time 2, the group B had 2 observations, the last is divided by 1 and the previous by 1.
Next in line 6 the result is:
`1/1 + 4/2+ 3/3 = 4`

Since at time 3, the group B had 3 observations, the last is divided by 1, the previous by 2 and the still previous by 3. At line 7,
`0/1 + 1/2 + 4/3 + 3/4 = 2.583333`
, and so on...

The data is big, so avoiding loops is essential!

I'd use matrix algebra:

``````n_max = DT[, .N, by=group][, max(N)]
m     = matrix(0, n_max, n_max)
m[]   = ifelse( col(m) >= row(m), 1 / (col(m) - row(m) + 1 ), m)

DT[, res := value %*% m[seq_len(.N), seq_len(.N)], by=group ]

group value       res
1:     A     2  2.000000
2:     A     9 10.000000
3:     A     2  7.166667
4:     B     3  3.000000
5:     B     4  5.500000
6:     B     1  4.000000
7:     B     0  2.583333
8:     B     3  4.933333
``````