M.Taitt M.Taitt - 21 days ago 6
Python Question

Whats the fastest way to loop through a DataFrame and count occurrences within the DataFrame whilst some condition is fulfilled (in Python)?

I have a dataframe with two Boolean fields (as below).

import pandas as pd

d = [{'a1':False, 'a2':False}, {'a1':True, 'a2':False}, {'a1':True, 'a2':False}, {'a1':False, 'a2':False}, {'a1':False, 'a2':True},
{'a1': False, 'a2': False}, {'a1':False, 'a2':False}, {'a1':True, 'a2':False}, {'a1':False, 'a2':True}, {'a1':False, 'a2':False},]

df = pd.DataFrame(d)
df

Out[1]:
a1 a2
0 False False
1 True False
2 True False
3 False False
4 False True
5 False False
6 False False
7 True False
8 False True
9 False False


I am trying to find the fastest and most "Pythonic" way of achieving the following:


  • If a1==True, count instances from current row where a2==False (e.g. row 1: a1=True, a2 is False for three rows from row 1)

  • At first instance of a2==True, stop counting (e.g. row 4, count = 3)

  • Set value of 'count' to new df column 'a3' on row where counting began (e.g. 'a3' = 3 on row 1)



Target result set as follows.

a1 a2 a3
0 False False 0
1 True False 3
2 True False 2
3 False False 0
4 False True 0
5 False False 0
6 False False 0
7 True False 1
8 False True 0
9 False False 0


I have been trying to accomplish this using for loops, iterrows and while loops and so far haven't been able to produce a good nested combination which provides the results I want. Any help appreciated. I apologize if the problem is not totally clear.

Answer Source

How about this:

df['a3'] = df.apply(lambda x: 0 if not x.a1 else len(df.a2[x.name:df.a2.tolist()[x.name:].index(True)+x.name]), axis=1)

So, if a1 is False write 0 else write the length of list that goes from that row until next True.