mysterious_guy mysterious_guy - 3 months ago 17
Python Question

groupby apply Pandas not yielding desired output

Below is a small sample of my dataframe : I am trying a groupby.apply which is not giving me the desired result.

In [204]: df1
Out[204]:
Location_ID Terminal Time
0 10000001405702 *WhF 2016-07-01 13:56:00
1 10000001405702 @W1n 2016-07-01 09:14:39
2 10000001405702 *Wu3 2016-07-01 11:54:52
3 10000001405702 @WJo 2016-07-01 11:30:57
4 10000001405702 @WCg 2016-07-01 11:06:24
5 10000001405702 *WL2 2016-07-01 10:04:20
6 10000001201132 A24O 2016-07-01 14:28:39
7 10000000564967 2JT1 2016-07-01 03:46:31
8 10000000615068 A125 2016-07-01 21:58:33
9 10000000552415 5MTH 2016-07-01 05:51:39
10 10000001405702 *WqW 2016-07-01 00:09:06
11 10000000250413 FF41 2016-07-01 02:59:43
12 10000001125037 WQ2I 2016-06-30 14:03:57
13 10000000174015 H5NM 2016-06-30 05:56:09
14 10000001856529 AR7K 2016-06-30 18:53:05


By doing the below groupby.apply , I am losing the Location_ID and Terminal information , but I need that .

In [206]: df1.groupby(['Location_ID','Terminal'])['Time'].apply(lambda x : x.diff()<=dt.timedelta(seconds=60))
Out[206]:
0 False
1 False
2 False
3 False
4 False
5 False
6 False
7 False
8 False
9 False
10 False
11 False
12 False
13 False
14 False
15 False
16 False
17 False


I need an output of below format such that The boolean info can be known for Location_IDs and Terminal.

In [211]: df3
Out[211]:
Time
Location_ID Terminal
10000000000081 3ZR1 False
CDE1 True
CDE4 False
GIG2 True
L43L False
L43W False
W9YE True
YIW1 False
YIW4 True
ZYI7 True
ZYJN False
10000000000086 A1E6 False
A4DG True


Still trying to find my grip in pandas. Thanks in advance.

vmg vmg
Answer

The result of your operation is a pandas Series. If you want it to be a column in a Dataframe you need to assign it to one.

Make df3 as a copy of df1 and change your call to:

df3['Time'] = df1.groupby(['Location_ID','Terminal'])['Time'].apply(lambda x : x.diff()<=dt.timedelta(seconds=60))

Also, you apparently want 'Location_ID' and 'Terminal' as indexes of the Dataframe.

Comments