anshanno anshanno - 1 month ago 7
Python Question

Python find difference between dates in one pandas column?

I am trying to come up with a method to calculate session duration. My sample data is below. I made the assumption that if someone logs in again - they start a new session and thus the previous session should have ended. So I will use login through the action before the user logs in again as session duration.

Action,Duration,_time,User
getForeignBugs,3,2016-11-07 15:45:18.992,savaithi
getServiceRequests,5,2016-11-07 15:45:18.902,savaithi
login,8088,2016-11-07 15:45:18.804,savaithi
getAuditTrail,550,2016-11-07 15:45:10.627,savaithi
getEnclosures,447,2016-11-07 15:45:09.994,savaithi
login,4810,2016-11-07 15:45:09.040,savaithi
getNoteTemplates,2,2016-11-07 15:45:04.220,savaithi
getQuickSearchInitInfo2,3,2016-11-07 15:45:01.995,savaithi
getQuickSearchInitInfo,3,2016-11-07 15:45:01.873,savaithi
login,0,2016-11-07 15:45:00.979,savaithi
getUserPreferences,2,2016-11-07 15:45:00.958,savaithi
getUserPreferences,2,2016-11-07 15:45:00.956,savaithi
SecurityServiceImpl.constructFromSession,2,2016-11-07 15:45:00.954,savaithi
setBooleanPreference,2,2016-11-07 15:45:00.954,savaithi
login,0,2016-11-07 15:45:00.658,savaithi
getPreference,1,2016-11-07 15:45:00.582,savaithi
getUserPreferences,129,2016-11-07 15:44:52.376,savaithi
login,2,2016-11-07 15:44:52.246,savaithi


How would I dynamically access data between login and login[index-1]?

for below example I want to use
getPreference,1,2016-11-07 15:45:00.582
-
login,2,2016-11-07 15:44:52.246


login,0,2016-11-07 15:45:00.658,savaithi
getPreference,1,2016-11-07 15:45:00.582,savaithi
getUserPreferences,129,2016-11-07 15:44:52.376,savaithi
login,2,2016-11-07 15:44:52.246,savaithi

Answer

IIUC you can do it this way:

first let's sort the DF:

In [71]: x = df.sort_values(['User','_time']).reset_index()

In [72]: x
Out[72]:
    index                                    Action  Duration                   _time      User
0      17                                     login         2 2016-11-07 15:44:52.246  savaithi
1      16                        getUserPreferences       129 2016-11-07 15:44:52.376  savaithi
2      15                             getPreference         1 2016-11-07 15:45:00.582  savaithi
3      14                                     login         0 2016-11-07 15:45:00.658  savaithi
4      12  SecurityServiceImpl.constructFromSession         2 2016-11-07 15:45:00.954  savaithi
5      13                      setBooleanPreference         2 2016-11-07 15:45:00.954  savaithi
6      11                        getUserPreferences         2 2016-11-07 15:45:00.956  savaithi
7      10                        getUserPreferences         2 2016-11-07 15:45:00.958  savaithi
8       9                                     login         0 2016-11-07 15:45:00.979  savaithi
9       8                    getQuickSearchInitInfo         3 2016-11-07 15:45:01.873  savaithi
10      7                   getQuickSearchInitInfo2         3 2016-11-07 15:45:01.995  savaithi
11      6                          getNoteTemplates         2 2016-11-07 15:45:04.220  savaithi
12      5                                     login      4810 2016-11-07 15:45:09.040  savaithi
13      4                             getEnclosures       447 2016-11-07 15:45:09.994  savaithi
14      3                             getAuditTrail       550 2016-11-07 15:45:10.627  savaithi
15      2                                     login      8088 2016-11-07 15:45:18.804  savaithi
16      1                        getServiceRequests         5 2016-11-07 15:45:18.902  savaithi
17      0                            getForeignBugs         3 2016-11-07 15:45:18.992  savaithi

now let's filer only those rows where Action == 'login' or those where next.Action == 'login'

In [73]: x.loc[(x.Action == 'login') | (x.Action.shift(-1) == 'login')]
Out[73]:
    index              Action  Duration                   _time      User
0      17               login         2 2016-11-07 15:44:52.246  savaithi
2      15       getPreference         1 2016-11-07 15:45:00.582  savaithi
3      14               login         0 2016-11-07 15:45:00.658  savaithi
7      10  getUserPreferences         2 2016-11-07 15:45:00.958  savaithi
8       9               login         0 2016-11-07 15:45:00.979  savaithi
11      6    getNoteTemplates         2 2016-11-07 15:45:04.220  savaithi
12      5               login      4810 2016-11-07 15:45:09.040  savaithi
14      3       getAuditTrail       550 2016-11-07 15:45:10.627  savaithi
15      2               login      8088 2016-11-07 15:45:18.804  savaithi

In [74]: x.loc[(x.Action == 'login') | (x.Action.shift(-1) == 'login'), '_time'].diff()
Out[74]:
0                NaT
2    00:00:08.336000
3    00:00:00.076000
7    00:00:00.300000
8    00:00:00.021000
11   00:00:03.241000
12   00:00:04.820000
14   00:00:01.587000
15   00:00:08.177000
Name: _time, dtype: timedelta64[ns]