AMisra AMisra - 3 months ago 18
Python Question

groupby with overlapping intervals timeseries

I have a time series in python pandas dataframe object and I want to create a group based on index but I want overlapping groups i.e groups are not distinct. The header_sec is the index column.
Each groups consists of a 2 second window.
Input dataFrame

header_sec
1 17004 days 22:17:13
2 17004 days 22:17:13
3 17004 days 22:17:13
4 17004 days 22:17:13
5 17004 days 22:17:14
6 17004 days 22:17:14
7 17004 days 22:17:14
8 17004 days 22:17:14
9 17004 days 22:17:15
10 17004 days 22:17:15
11 17004 days 22:17:15
12 17004 days 22:17:15
13 17004 days 22:17:16
14 17004 days 22:17:16
15 17004 days 22:17:16
16 17004 days 22:17:16
17 17004 days 22:17:17
18 17004 days 22:17:17
19 17004 days 22:17:17
20 17004 days 22:17:17


My first group should have

1 17004 days 22:17:13
2 17004 days 22:17:13
3 17004 days 22:17:13
4 17004 days 22:17:13
5 17004 days 22:17:14
6 17004 days 22:17:14
7 17004 days 22:17:14
8 17004 days 22:17:14


The second group starts from the previous index and takes 1/2 of the records in previous second.

7 17004 days 22:17:14
8 17004 days 22:17:14
9 17004 days 22:17:15
10 17004 days 22:17:15
11 17004 days 22:17:15
12 17004 days 22:17:15
13 17004 days 22:17:16
14 17004 days 22:17:16


Third group .....

13 17004 days 22:17:16
14 17004 days 22:17:16
15 17004 days 22:17:16
16 17004 days 22:17:16
17 17004 days 22:17:17
18 17004 days 22:17:17
19 17004 days 22:17:17
20 17004 days 22:17:17


If I do groupby on index,

dfgroup=df.groupby(df.index)


this gives one group per second. What would be the best way to merge these groups?

Answer

Here is a technique:

import numpy as np # if you have not already done this

grouped = df.groupby(df.index)

for name, group in grouped:
    try:
        prev_sec = df.loc[(name - pd.to_timedelta(1, unit='s')), :]
    except KeyError:
        prev_sec = pd.DataFrame(columns=group.columns)
    try:
        next_sec = df.loc[(name + pd.to_timedelta(1, unit='s')), :]
    except KeyError:
        next_sec = pd.DataFrame(columns=group.columns)
    Pn = 2 # replace this with int(len(prev_sec)/2) to get half rows from previous second
    Nn = 2 # replace this with int(len(next_sec)/2) to get half rows from next second
    group = pd.concat([prev_sec.iloc[-Pn:,:], group, next_sec.iloc[:Nn,:]])

    # Replace the below lines with your operations
    print(name, group)