user113531 user113531 -4 years ago 179
Python Question

Pandas get average time interval within groups

I have a DataFrame containing an

EffectiveDate
column.
I want to groupby the DataFrame by a Key value and then calculate the average time interval for all the date values in each group for the
EffectiveDate
column.

For example for the DataFrame:

EffectiveDate
1 2015-08-17 07:00:00
1 2015-08-18 07:00:00
1 2015-08-19 07:00:00
2 2015-08-20 07:00:00
2 2015-08-21 07:00:00
2 2015-09-16 07:00:00
2 2015-10-15 07:00:00
2 2015-11-16 08:00:00


I want to groupby the Index and calculate the average interval between the date values in the EffectiveDate column.

15199 2015-08-17 07:00:00
15214 2015-08-18 07:00:00
15219 2015-08-19 07:00:00
15233 2015-08-20 07:00:00
15254 2015-08-21 07:00:00
15687 2015-09-16 07:00:00
199 2015-10-15 07:00:00
1123 2015-11-16 08:00:00
Name: EffectiveDate, dtype: datetime64[ns]


On a single Series this seems to work fine:

EffectiveDate.diff().astype('timedelta64[s]').mean()


However when I am using the same function as a groupby aggregate on a pandas DataFrame:

df.groupby('Key').agg({
'EffectiveDate': lambda x: x.diff().astype('timedelta64[s]').mean()
})


The results are

EffectiveDate
1 1970-01-01 00:00:00.016747425
2 1970-01-01 00:00:00.017765280
3 1970-01-01 00:00:00.034776096
4 1970-01-01 00:00:00.002052450
5 1970-01-01 00:00:00.018238800
6 1970-01-01 00:00:00.024005438
7 1970-01-01 00:00:00.012330000


I would expect an integer field in each column. I am using Pandas
0.19.2
.

Answer Source

GroupBy.agg seems to attempt to cast back to the original dtype of the EffectiveDate column in 0.19.2. This might make sense generally I think, as we would expect an aggregation down the column to have the same dtype.

To fix this issue, you could use GroupBy.apply instead in 0.19.2, which doesn't perform the same cast afterwards.

df.groupby(df.index).apply(
    lambda x: x.diff().astype('timedelta64[s]').mean()
)

Seemingly this didn't used to be the case, as I can reproduce your behavior in 0.18.1 only after casting to the original dtype of the EffectiveDate column.

In 0.18.1

>>> df
        EffectiveDate
1 2015-08-17 07:00:00
1 2015-08-18 07:00:00
1 2015-08-19 07:00:00
2 2015-08-20 07:00:00
2 2015-08-21 07:00:00
2 2015-09-16 07:00:00
2 2015-10-15 07:00:00
2 2015-11-16 08:00:00

>>> df.groupby(df.index).agg({
        'EffectiveDate': lambda x: x.diff().astype('timedelta64[s]').mean()
})

   EffectiveDate
1        86400.0
2      1901700.0

>>> df.groupby(df.index).agg({
        'EffectiveDate': lambda x: x.diff().astype('timedelta64[s]').mean()
}).astype(df.EffectiveDate.dtype)

                  EffectiveDate
1 1970-01-01 00:00:00.000086400
2 1970-01-01 00:00:00.001901700
Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download