ghurpost ghurpost - 15 days ago 5
Python Question

How to efficiently get indices of rows of DataFrame, where these rows meet certain cumulative criteria?

For example I would like to get letters indicating a row where period of at least two consecutive drops in other column begins.

Exemplary data:

a b
0 3 a
1 2 b
2 3 c
3 2 d
4 1 e
5 0 f
6 -1 g
7 3 h
8 1 i
9 0 j


Exemplary solution with simple loop:

import pandas as pd

df = pd.DataFrame({'a': [3,2,3,2,1,0,-1,3,1,0], 'b': list('abcdefghij')})

less = 0
l = []
prev_prev_row = df.iloc[0]
prev_row = df.iloc[1]
if prev_row['a'] < prev_prev_row['a']: less = 1
for i, row in df.iloc[2:len(df)].iterrows():
if row['a'] < prev_row['a']:
less = less + 1
else:
less = 0
if less == 2:
l.append(prev_prev_row['b'])
prev_prev_row = prev_row
prev_row = row


This gives list
l
:

['c', 'h']

Answer

use rolling(2) in reverse

s = df.a[::-1].diff().gt(0).rolling(2).sum().eq(2)
df.b.loc[s & (s != s.shift(-1))]

2    c
7    h
Name: b, dtype: object

if you actually wanted a list

df.b.loc[s & (s != s.shift(-1))].tolist()

['c', 'h']