Jakub Langr - 12 days ago 8
R Question

# How to calculate time difference between datetimes, for each group (student-contract)?

I have a specific problem; I have data in the following format:

``````#   USER_ID SUBMISSION_DATE CONTRACT_REF
1        1       20/6 1:00         W001
2        1       20/6 2:00         W002
3        1       20/6 3:30         W003
4        4       20/6 4:00         W004
5        5       20/6 5:00         W005
6        5       20/6 6:00         W006
7        7       20/6 7:00         W007
8        7       20/6 8:00         W008
9        7       20/6 9:00         W009
10       7      20/6 10:00        W0010
``````

Now I need to somehow calculate the time difference between the different submissions (uniquely identifiable).

In other words:
I have a table of submissions, in this table, there are all submissions for all users. I need to find a way how to calculate the time difference for each unique STUDENT-CONTRACT tuple between nth assignment and the (n-1)th assignment.

Also note that each new user has to has zero for the new assignment. So the output would look as follows:

``````#   USER_ID SUBMISSION_DATE CONTRACT_REF  TIME_DIFFRENCE
1        1       20/6 1:00         W001                0
2        1       20/6 2:00         W002             3600
3        1       20/6 3:30         W003             5400
4        4       20/6 4:00         W004             3600
5        5       20/6 5:00         W005                0
6        5       20/6 6:00         W006             3600
7        7       20/6 7:00         W007                0
8        7       20/6 8:00         W008             3600
9        7       20/6 9:00         W009             3600
10       7      20/6 10:00        W0010             3600
``````

Note that the time may NOT be in seconds, but whatever is suitable.

My thoughts:
1) I presume this will require as.POSIXct somewhere so that R knows how to deal with the time
2) This may involve some package such as
`plyr`
, but I am so utterly lost in the documentation and examples are hard to find.

Thank you very much for all responses!

Best,
Jakub

Here's an attempt. Firstly, get the data:

``````dat <- read.csv(text="USER_ID,SUBMISSION_DATE,CONTRACT_REF
1,20/6 1:00,W001
1,20/6 2:00,W002
1,20/6 3:30,W003
4,20/6 4:00,W004
5,20/6 5:00,W005
5,20/6 6:00,W006
7,20/6 7:00,W007
7,20/6 8:00,W008
7,20/6 9:00,W009
``````

Get the number from the contract ref and sort the data

``````dat\$CR_NUM <- as.numeric(gsub("W","",dat\$CONTRACT_REF))
dat <- with(dat,dat[order(USER_ID,CR_NUM),])
``````

Convert the date to a POSIXct numeric representation

``````dat\$SD_DATE <- as.numeric(with(dat,as.POSIXct(SUBMISSION_DATE,format="%d/%m %H:%M")))
``````

Calculate a time difference with a 0 at the start using `ave`

``````dat\$TIME_DIFF <- with(dat, ave(SD_DATE, USER_ID, FUN=function(x) c(0,diff(x)) ))
``````

Result:

``````# not showing the calculated columns
dat[-c(4:5)]

USER_ID SUBMISSION_DATE CONTRACT_REF TIME_DIFF
1        1       20/6 1:00         W001         0
2        1       20/6 2:00         W002      3600
3        1       20/6 3:30         W003      5400
4        4       20/6 4:00         W004         0
5        5       20/6 5:00         W005         0
6        5       20/6 6:00         W006      3600
7        7       20/6 7:00         W007         0
8        7       20/6 8:00         W008      3600
9        7       20/6 9:00         W009      3600
10       7      20/6 10:00        W0010      3600
``````