Thanos Thanos - 7 months ago 9
Python Question

Groupby - How to apply result of logic check to all rows

I have a set that looks like this:

In [127]: df
Out[127]:
ID Date regular_entry
0 1 2014-01-31 12:13:14 True
1 2 2014-02-28 12:13:14 False
2 1 2014-03-31 12:13:14 True
3 1 2014-04-30 12:13:14 True
4 2 2014-05-31 12:13:14 False
5 2 2014-06-30 12:13:14 True
6 3 2014-07-31 12:13:14 False
7 3 2014-08-31 12:13:14 True
8 3 2014-09-30 12:13:14 False
9 1 2014-10-31 12:13:14 True


I need to find whether there are any rows, such that
'regular_entry' == False
, for each of the groups (if grouped by
'ID'
).

I am using
pandas.Series.all()
and
transform()
to achieve this - as shown below - and it works great:

In [134]: df['ever_irregular'] = df.groupby('ID')['regular_entry'].transform(lambda x: False if x.all() else True )

In [135]: df
Out[135]:
ID Date regular_entry ever_irregular
0 1 2014-01-31 12:13:14 True False
1 2 2014-02-28 12:13:14 False True
2 1 2014-03-31 12:13:14 True False
3 1 2014-04-30 12:13:14 True False
4 2 2014-05-31 12:13:14 False True
5 2 2014-06-30 12:13:14 True True
6 3 2014-07-31 12:13:14 False True
7 3 2014-08-31 12:13:14 True True
8 3 2014-09-30 12:13:14 False True
9 1 2014-10-31 12:13:14 True False


Now, I also need to find whether the last entry for each group (if grouped by
'ID'
and taking the values of
'Date'
in mind) had
'regular_entry' == False


I know I can get the last entry per group like this:

In [138]: df.sort_values(by='Date').groupby('ID').nth(-1)['regular_entry']
Out[138]:
ID
1 True
2 True
3 False
Name: regular_entry, dtype: bool


And I have by now figured out that I can try to join the two like this:

In [152]: df_new = pd.DataFrame(latest_row_regular).rename(columns={'regular_entry':'latest_regular'})

In [155]: pd.merge(df, df_new, left_on='ID', right_index=True).sort_values(by='Date')
Out[155]:
ID Date regular_entry ever_irregular latest_regular
0 1 2014-01-31 12:13:14 True False True
1 2 2014-02-28 12:13:14 False True True
2 1 2014-03-31 12:13:14 True False True
3 1 2014-04-30 12:13:14 True False True
4 2 2014-05-31 12:13:14 False True True
5 2 2014-06-30 12:13:14 True True True
6 3 2014-07-31 12:13:14 False True False
7 3 2014-08-31 12:13:14 True True False
8 3 2014-09-30 12:13:14 False True False
9 1 2014-10-31 12:13:14 True False True


This seems to be working fine, however, it does seem like the long way round. Is there some easier/faster way to get the values for each of the groups (as grouped after
groupby()
) and apply directly instead of following all the intermediate steps?

Thanks for the help!

Answer

I think you can use transform with iloc:

df['latest_regular'] = df.groupby('ID')['regular_entry'].transform(lambda x: x.iloc[-1])
print df
   ID                Date regular_entry latest_regular
0   1 2014-01-31 12:13:14          True           True
1   2 2014-02-28 12:13:14         False           True
2   1 2014-03-31 12:13:14          True           True
3   1 2014-04-30 12:13:14          True           True
4   2 2014-05-31 12:13:14         False           True
5   2 2014-06-30 12:13:14          True           True
6   3 2014-07-31 12:13:14         False          False
7   3 2014-08-31 12:13:14          True          False
8   3 2014-09-30 12:13:14         False          False
9   1 2014-10-31 12:13:14          True           True