gangi gangi - 2 months ago 4
Python Question

For a particular ID, check for every 3 mins and set a flag in python

Suppose I am having data as follows, The data has three columns ID, date and Time,

ID date Time
1 8/20/2016 20:27:25
1 8/20/2016 20:29:04
1 8/20/2016 20:29:05
1 8/20/2016 20:38:38
1 8/20/2016 20:38:38
2 8/20/2016 14:24:53
3 8/20/2016 21:18:37
3 8/20/2016 21:18:37
4 8/20/2016 01:08:34
4 8/20/2016 01:08:35
4 8/20/2016 01:08:35
4 8/20/2016 01:08:35
4 8/20/2016 11:08:35
4 8/20/2016 11:08:35
4 8/20/2016 11:08:35
5 8/20/2016 09:35:17
5 8/20/2016 09:36:05
5 8/20/2016 09:36:19
5 8/20/2016 09:36:21
5 8/20/2016 00:01:59
5 8/20/2016 00:04:59
6 8/20/2016 00:02:13
6 8/20/2016 00:02:17
6 8/20/2016 00:02:19
6 8/20/2016 00:02:21
6 8/20/2016 00:02:24
6 8/20/2016 00:02:26
6 8/20/2016 00:04:27
6 8/20/2016 00:04:27
6 8/20/2016 00:04:28
6 8/20/2016 00:04:30
6 8/20/2016 00:04:35
6 8/20/2016 01:45:23
7 8/20/2016 00:14:30
7 8/20/2016 00:14:33
7 8/20/2016 00:14:47
7 8/20/2016 00:14:56
7 8/20/2016 00:21:56


For every ID and particular date, I want to check the first time and then find out the entries corresponding to next 3 minutes and set a flag as 1. If the time is not within three 3 minutes, I want to set the flag to 2 and then keep checking for the next three minutes. Basically I want to find out three minute sets for every ID. I want to analyse the records for every three minutes and If there is a flag like this, It would help me in analyzing it.

the output I want is,

ID date Time flag
1 8/20/2016 20:27:25 1
1 8/20/2016 20:29:04 1
1 8/20/2016 20:29:05 1
1 8/20/2016 20:38:38 2
1 8/20/2016 20:38:38 2
2 8/20/2016 14:24:53 1
3 8/20/2016 21:18:37 1
3 8/20/2016 21:18:37 1
4 8/20/2016 01:08:34 1
4 8/20/2016 01:08:35 1
4 8/20/2016 01:08:35 1
4 8/20/2016 01:08:35 1
4 8/20/2016 11:08:35 2
4 8/20/2016 11:08:35 2
4 8/20/2016 11:08:35 2
5 8/20/2016 09:35:17 1
5 8/20/2016 09:36:05 1
5 8/20/2016 09:36:19 1
5 8/20/2016 09:36:21 1
5 8/20/2016 00:01:59 2
5 8/20/2016 00:04:59 3
6 8/20/2016 00:02:13 1
6 8/20/2016 00:02:17 1
6 8/20/2016 00:02:19 1
6 8/20/2016 00:02:21 1
6 8/20/2016 00:02:24 1
6 8/20/2016 00:02:26 1
6 8/20/2016 00:04:27 2
6 8/20/2016 00:04:27 2
6 8/20/2016 00:04:28 2
6 8/20/2016 00:04:30 2
6 8/20/2016 00:04:35 2
6 8/20/2016 01:45:23 3
7 8/20/2016 00:14:30 1
7 8/20/2016 00:14:33 1
7 8/20/2016 00:14:47 1
7 8/20/2016 00:14:56 1
7 8/20/2016 00:21:56 2


where for every ID, date, for every 3 minutes, 1 is added to the flag.

I am not able to try regarding this as I am new to python. I apologize for this. If anybody can give any idea of how to do this, it would be helpful for me.

sal sal
Answer

The first to do would be to put the data in useful data structure: I chose a list of sets. These records would first need to be grouped by their (id, date) which is an easy task using the groupby function of itertools.

If the data came from a database, you could run such grouping in the database itself.

Then all that's left is to loop in each of those groups, and anytime the time delta between the initial record and the next exceeds 3 minutes (3*60) increase the flag value and update the reference_time so that it moves along.

Full code below, although data is reduced, but you can also see the whole working example at https://eval.in/636448

Note: I believe your reference data has an error in dataset id=6.

from itertools import groupby
from datetime import datetime

data = [
('1', '8/20/2016', '20:27:25'),
 # ...
('7', '8/20/2016', '00:21:56')
]


FMT = '%H:%M:%S'
for key, group in groupby(data, lambda x: (x[0],x[1])):
    # group data based on (id, date), ignore 'key'

    reference_time = None
    flag = 1
    for cur_id, cur_date, cur_time in group:
        # for each group, test the delta-3 condition
        if reference_time is None:
            # init
            reference_time = cur_time
            print cur_id, cur_date, cur_time, flag
            continue
        delta = datetime.strptime(cur_time, FMT) - datetime.strptime(reference_time, FMT)
        if delta.seconds >= (3*60):
            # check if time diff is >= 3 minutes from start of sequence
            # increase flag, and update the reference timestamp
            reference_time = cur_time
            flag += 1
        print cur_id, cur_date, cur_time, flag

Result:

1 8/20/2016 20:27:25 1
1 8/20/2016 20:29:04 1
1 8/20/2016 20:29:05 1
1 8/20/2016 20:38:38 2
1 8/20/2016 20:38:38 2
2 8/20/2016 14:24:53 1
3 8/20/2016 21:18:37 1
3 8/20/2016 21:18:37 1
4 8/20/2016 01:08:34 1
4 8/20/2016 01:08:35 1
4 8/20/2016 01:08:35 1
4 8/20/2016 01:08:35 1
4 8/20/2016 11:08:35 2
4 8/20/2016 11:08:35 2
4 8/20/2016 11:08:35 2
5 8/20/2016 09:35:17 1
5 8/20/2016 09:36:05 1
5 8/20/2016 09:36:19 1
5 8/20/2016 09:36:21 1
5 8/20/2016 00:01:59 2
5 8/20/2016 00:04:59 3
6 8/20/2016 00:02:13 1
6 8/20/2016 00:02:17 1
6 8/20/2016 00:02:19 1
6 8/20/2016 00:02:21 1
6 8/20/2016 00:02:24 1
6 8/20/2016 00:02:26 1
6 8/20/2016 00:04:27 1
6 8/20/2016 00:04:27 1
6 8/20/2016 00:04:28 1
6 8/20/2016 00:04:30 1
6 8/20/2016 00:04:35 1
6 8/20/2016 01:45:23 2
7 8/20/2016 00:14:30 1
7 8/20/2016 00:14:33 1
7 8/20/2016 00:14:47 1
7 8/20/2016 00:14:56 1
7 8/20/2016 00:21:56 2
Comments