Kitira Kitira - 13 days ago 6
R Question

Adding number if next value the same r

I'm facing the following problem in R. I have a dataframe with values identifing a customer. There is a column with User ID. I need to add another column with a counter what is the occurence number of that particular customer in the data. The dataframe is sorted by User ID. So i have something like that:

> niekonwersyjne[c(57:62,72:77),1]
User_ID
AMsySZa--1Og4WwseZJKRyABTWdh
AMsySZa--1Og4WwseZJKRyABTWdh
AMsySZa--1Og4WwseZJKRyABTWdh
AMsySZa--1Og4WwseZJKRyABTWdh
AMsySZa--1Og4WwseZJKRyABTWdh
AMsySZa--1qZghdxj4gypoSQRt_F
AMsySZa--2gL6xRCZFUCOXtpYxNs
AMsySZa--2gL6xRCZFUCOXtpYxNs
AMsySZa--2gL6xRCZFUCOXtpYxNs
AMsySZa--2gL6xRCZFUCOXtpYxNs
AMsySZa--2gL6xRCZFUCOXtpYxNs
AMsySZa--2gL6xRCZFUCOXtpYxNs


But need something like this:

> niekonwersyjne[c(57:62,72:77),c(1,11)]
User_ID Counter
AMsySZa--1Og4WwseZJKRyABTWdh 1
AMsySZa--1Og4WwseZJKRyABTWdh 2
AMsySZa--1Og4WwseZJKRyABTWdh 3
AMsySZa--1Og4WwseZJKRyABTWdh 4
AMsySZa--1Og4WwseZJKRyABTWdh 5
AMsySZa--1qZghdxj4gypoSQRt_F 1
AMsySZa--2gL6xRCZFUCOXtpYxNs 1
AMsySZa--2gL6xRCZFUCOXtpYxNs 2
AMsySZa--2gL6xRCZFUCOXtpYxNs 3
AMsySZa--2gL6xRCZFUCOXtpYxNs 4
AMsySZa--2gL6xRCZFUCOXtpYxNs 5
AMsySZa--2gL6xRCZFUCOXtpYxNs 6


I can do this with a loop but the data frame has over 20 mil observations so the calculation time is defintely too high. Is there some other way to achieve this result?

The loop that I am using right now looks like this:

niekonwersyjne$Counter<-1

for (i in 2:nrow(niekonwersyjne)) {
if (niekonwersyjne[i-1,"User_ID"]==niekonwersyjne[i,"User_ID"]) {
niekonwersyjne[i,"Counter"]<-niekonwersyjne[i-1,"Counter"]+1} else {
niekonwersyjne[i,"Counter"]<-1
}
}

Answer

I find the data.table method quite nice:

library( data.table )
setDT( df )[ , counter := seq_len( .N ), by = User_ID ]

This "splits" the data into subsets based on the by parameter (here User_ID) and adds a sequence to each group, the same length as the group itself.

Or with dplyr

library( dplyr )
df <- df %>%
    group_by( User_ID ) %>%
    mutate( counter = seq_len( n() ) )