CronosVirus00 CronosVirus00 - 2 months ago 8
Python Question

Counting how many times in a row the result of a sum is positive (or negative)

I have a dataframe with finance data (33023 rows, here the link to the data: https://mab.to/Ssy3TelRs); df.open is the price of the title and df.close is the closing price.

I have been trying to see how many times in a row the title closed with a gain and with a lost.

The result that I'm looking for should tell me that the title was positive 2 days in a row x times, 3 days in a row y times, 4 days in a row z times and so forth.

I have started with a for:

for x in range(1,df.close.count()):
y = df.close[x]-df.open[x]


and then unsuccessful series of if statements...

Thank you for your help.

CronosVirus00

EDITS:

>>> df.head(7)
data ora open max min close Unnamed: 6
0 20160801 0 1.11781 1.11781 1.11772 1.11773 0
1 20160801 100 1.11774 1.11779 1.11773 1.11777 0
2 20160801 200 1.11779 1.11800 1.11779 1.11795 0
3 20160801 300 1.11794 1.11801 1.11771 1.11771 0
4 20160801 400 1.11766 1.11772 1.11763 1.11772 0
5 20160801 500 1.11774 1.11798 1.11774 1.11796 0
6 20160801 600 1.11796 1.11796 1.11783 1.11783 0


Ifs:

for x in range(1,df.close.count()):
y = df.close[x]-df.open[x]
if y > 0 :
green += 1
y = df.close[x+1] - df.close[x+1]
twotimes += 1
if y > 0 :
green += 1
y = df.close[x+2] - df.close[x+2]
threetimes += 1
if y > 0 :
green += 1
y = df.close[x+3] - df.close[x+3]
fourtimes += 1

Answer

This should work:

import pandas as pd
import numpy as np
test = pd.DataFrame(np.random.randn(100,2), columns = ['open','close'])

test['gain?'] = (test['open']-test['close'] < 0)
test['cumulative'] = 0

for i in test.index[1:]:
    if test['gain?'][i]:
        test['cumulative'][i] = test['cumulative'][i-1] + 1
        test['cumulative'][i-1] = 0

results = test['cumulative'].value_counts()

Ignore the '0' row in the results. It can be modified without too much trouble if you want to e.g count both days in a run-of-two as runs-of-one as well.

Edit: without the warnings -

import pandas as pd
import numpy as np

test = pd.DataFrame(np.random.randn(100,2), columns = ['open','close'])
test['gain?'] = (test['open']-test['close'] < 0)
test['cumulative'] = 0

for i in test.index[1:]:
    if test['gain?'][i]:
        test.loc[i,'cumulative'] = test.loc[i-1,'cumulative'] + 1
        test.loc[i-1,'cumulative'] = 0

results = test['cumulative'].value_counts()