Lucia Lucia - 2 months ago 17
R Question

cumsum unique item according to time

I want to cumsum unique userid according to time, here's the data

userid time
1 2014-10-01
1 2014-11-16
2 2014-11-14
1 2014-10-16
2 2014-11-16
3 2014-10-16


the number of unique id on 2014-10-16 is 2, the number of unique id up to 2014-11-14 is 3.

I want to return a matrix like this

userid time count
1 2014-10-01 1
1 2014-11-16 3
2 2014-11-14 3
1 2014-10-16 2
2 2014-11-16 3
3 2014-10-16 2


I guess I should use cumsum to do it. Actually, I found this thread is doing similar stuff Cumulative count of unique values in R

However, I follow the first answer of this thread but it doesn't work on my case

Answer

Since it seems that you're using data.table (from the look of your first version of the post), here's an approach using that package:

library(data.table)
setDT(dt) # just in case it's still a data.frame
dt[order(time), count := cumsum(!duplicated(userid))][,   
                count := max(count), by = time]
dt
#   userid       time count
#1:      1 2014-10-01     1
#2:      1 2014-11-16     3
#3:      2 2014-11-14     3
#4:      1 2014-10-16     2
#5:      2 2014-11-16     3
#6:      3 2014-10-16     2

After ordering the data by time, we compute the cumulative count of unique userid's and then get the maximum count per time.

Comments