CronosVirus00 CronosVirus00 - 2 months ago 14
Python Question

Part II: Counting how many times in a row the result of a sum is positive (or negative)

Second part First part can be found here: Click me

Hi all, I have been practising with the gg function that you guys help me create -- see part one. Now, I realized that the output of the function are not unique series, yet a sum: for instance, a series of 3 positives in a row is also shown as 2 series of two positives in a row and as 3 single positives.

Let's say I got this:

df = pd.DataFrame(np.random.rand(15, 2), columns=["open", "close"])
df['test'] = df.close-df.open > 0

open close test
0 0.769829 0.261478 False
1 0.770246 0.128516 False
2 0.266448 0.346099 True
3 0.302941 0.065790 False
4 0.747712 0.730082 False
5 0.382923 0.751792 True
6 0.028505 0.083543 True
7 0.137558 0.243148 True
8 0.456349 0.649780 True
9 0.041046 0.163488 True
10 0.291495 0.617486 True
11 0.046561 0.038747 False
12 0.782994 0.150166 False
13 0.435168 0.080925 False
14 0.679253 0.478050 False

df.test
Out[113]:
0 False
1 False
2 True
3 False
4 False
5 True
6 True
7 True
8 True
9 True
10 True
11 False
12 False
13 False
14 False


As output, I would like the unique number of series of True in a row; something like:

1: 1
2: 0
3: 0
4: 0
5: 0
6: 1
7: 0
8: 0


What I've tried so far:

(green.rolling(x).sum()>x-1).sum() #gives me how many times there is a series of x True in a row; yet, this is not unique as explained beforehand


However, I do not feel the rolling is the solution over here...

Thank you again for your help,
CronosVirus00

Answer

What you are looking for are the groupby function from itertools and Counter from collections. Here is how to achieve what you want :

import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.rand(15, 2), columns=["open", "close"])
df['test'] = df.close-df.open > 0 

from itertools import groupby
from collections import Counter
#we group each sequence of True and False
seq_len=[(k,len(list(g))) for k, g in groupby(list(df['test']))]
#we filter to keep only True sequence lenght
true_seq_len= [n for k,n in seq if k == True]
#we count each length
true_seq_count = Counter(true_seq_len)

Output :

>>> print(df['test'])
0      True
1      True
2     False
3      True
4      True
5     False
6      True
7     False
8      True
9      True
10     True
11     True
12    False
13    False
14     True
>>>print(seq_len)
[(True, 2), (False, 1), (True, 2), (False, 1), (True, 1), (False, 1), (True, 4), (False, 2), (True, 1)]
>>>print(true_seq_count)
Counter({1: 2, 2: 2, 4: 1})