Thanos - 11 months ago 51

Python Question

I have the following dataframe:

`In [372]: df_2`

Out[372]:

A ID3 DATETIME

0 B-028 b76cd912ff 2014-10-08 13:43:27

1 B-054 4a57ed0b02 2014-10-08 14:26:19

2 B-076 1a682034f8 2014-10-08 14:29:01

3 B-023 b76cd912ff 2014-10-08 18:39:34

4 B-023 f88g8d7sds 2014-10-08 18:40:18

5 B-033 b76cd912ff 2014-10-08 18:44:30

6 B-032 b76cd912ff 2014-10-08 18:46:00

7 B-037 b76cd912ff 2014-10-08 18:52:15

8 B-046 db959faf02 2014-10-08 18:59:59

9 B-053 b76cd912ff 2014-10-08 19:17:48

10 B-065 b76cd912ff 2014-10-08 19:21:38

And I want to find the difference between different entries - grouped by

`'ID3'`

I am trying to use

`transform()`

`GroupBy`

`In [379]: df_2['diff'] = df_2.sort_values(by='DATETIME').groupby('ID3')['DATETIME'].transform(lambda x: x.diff()); df_2['diff']`

Out[379]:

0 NaT

1 NaT

2 NaT

3 1970-01-01 04:56:07

4 NaT

5 1970-01-01 00:04:56

6 1970-01-01 00:01:30

7 1970-01-01 00:06:15

8 NaT

9 1970-01-01 00:25:33

10 1970-01-01 00:03:50

Name: diff, dtype: datetime64[ns]

I have also tried with

`x.diff().astype(int)`

`lambda`

Datatype of both

`'DATETIME'`

`'diff'`

`datetime64[ns]`

What I want to achieve is have

`diff`

I have figured out that I can convert

`df_2['diff']`

`TimeDelta`

`In [405]: df_2['diff'] = pd.to_timedelta(df_2['diff']).map(lambda x: x.total_seconds()); df_2['diff']`

Out[407]:

0 NaN

1 NaN

2 NaN

3 17767.0

4 NaN

5 296.0

6 90.0

7 375.0

8 NaN

9 1533.0

10 230.0

Name: diff, dtype: float64

Is there a way to achieve this (seconds as values for

`df_2['diff']`

`transform`

To make it clear, I have tried making conversion to

`TimeDelta`

`transform`

Thanks for the help!

Answer

**UPDATE:** `transform()`

from `class NDFrameGroupBy(GroupBy)`

doesn't seem to do downcasting and works as expected:

```
In [220]: (df_2[['ID3','DATETIME']]
.....: .sort_values(by='DATETIME')
.....: .groupby('ID3')
.....: .transform(lambda x: x.diff().dt.total_seconds())
.....: )
Out[220]:
DATETIME
0 NaN
1 NaN
2 NaN
3 17767.0
4 NaN
5 296.0
6 90.0
7 375.0
8 NaN
9 1533.0
10 230.0
```

the `transform()`

from `class SeriesGroupBy(GroupBy)`

tries to do the following:

```
result = _possibly_downcast_to_dtype(result, dtype)
```

which could (i'm not sure) cause your problem

**OLD answer:**

try this:

```
In [168]: df_2.sort_values(by='DATETIME').groupby('ID3')['DATETIME'].diff().dt.total_seconds()
Out[168]:
0 NaN
1 NaN
2 NaN
3 17767.0
4 NaN
5 296.0
6 90.0
7 375.0
8 NaN
9 1533.0
10 230.0
dtype: float64
```