Rayan Sp Rayan Sp - 1 month ago 6
R Question

Count number of occurrence within time frame in R

This is a tough one for me. I have 3 months data (up to 1m obs) and I have 2 columns in my data.frame

Date_Time Number
12/1/2015 12:00:01 AM 92222222
12/1/2015 12:00:29 AM 32211111
12/1/2015 12:00:41 AM 22333333
12/1/2015 12:00:43 AM 12222222
..... .....
12/1/2015 9:00:02 AM 92222222
12/2/2015 12:00:02 AM 32211111


How to count the occurrence/Frequency of each value in column "Number" within time frame of 24 hours.

the expected result of the above example

92222222 Freq: 2
32211111 Freq: 2
22333333 Freq: 1
12222222 Freq: 1


EDIT

time frame of 24 hours refer to interval of 24 hours. it doesn't mean from midnight to midnight. for example, if someone calls at 5 PM today, and call again at 3 PM next day, this should be counted as 2

Edit 2:
To be clearer, the objective of this analysis is to know the number of repeat calls in the call center for window period of 24 hours.

for example, customer called from contact number 01101111 on 1/Jan/2016 1:32:01 PM
& then called again on 1/Jan/2016 1:59:43 PM. and finally called next day 2/Jan/2016 12:21:02 PM
It's considered that the frequency of 0110111 is "3" because the number is repeated 3 times in less than 24 hours.

tfc tfc
Answer

Based on your comments, for any number the start of the period is the earliest call from that number. Below is the commented code:

library(lubridate)                                                              
library(dplyr)          

calls <- structure(list(Date_Time = structure(1:6, .Label = c("12/1/2015 12:00:01 AM", 
"12/1/2015 12:00:29 AM", "12/1/2015 12:00:41 AM", "12/1/2015 12:00:43 AM", 
"12/1/2015 9:00:02 AM", "12/2/2015 12:00:02 AM"), class = "factor"), 
    Number = structure(c(4L, 3L, 2L, 1L, 4L, 3L), .Label = c("12222222", 
    "22333333", "32211111", "92222222"), class = "factor")), .Names = c("Date_Time", 
"Number"), row.names = c(NA, -6L), class = "data.frame")


count_freq <- function(timestamps){                                             
    #Given all the ocurrences of calls from a number find the 
    #earliest one and count how many occur within 24 hours
    dtime <- sort(mdy_hms(timestamps))                                            
    start_time <- dtime[1]                                                        
    end_time <- start_time + hours(24)                                            
    sum(dtime >= start_time & dtime <= end_time)                                  
}


out <- group_by(calls, Number) %>% 
       summarise(freq = count_freq(Date_Time))