Excaliburst Excaliburst - 5 months ago 17
Python Question

pandas groupby object index turns into FUBAR

I have a pandas where my groupby operation turns the index into mush. I need the dates as my index, sorted within each ticker-group

To illustrate. Setup pandas like so:

import pandas as pd
from StringIO import StringIO

text = """Date Ticker Open High Low Adj_Close Volume
2015-04-09 vws.co 315.000000 316.100000 312.500000 311.520000 1686800
2015-04-10 vws.co 317.000000 319.700000 316.400000 312.700000 1396500
2015-04-13 vws.co 317.900000 321.500000 315.200000 315.850000 1564500
2015-04-14 vws.co 320.000000 322.400000 318.700000 314.870000 1370600
2015-04-15 vws.co 320.000000 321.500000 319.200000 316.150000 945000
2015-04-16 vws.co 319.000000 320.200000 310.400000 307.870000 2236100
2015-04-17 vws.co 309.900000 310.000000 302.500000 299.100000 2711900
2015-04-20 vws.co 303.000000 312.000000 303.000000 306.490000 1629700
2016-03-31 mmm 166.750000 167.500000 166.500000 166.630005 1762800
2016-04-01 mmm 165.630005 167.740005 164.789993 167.529999 1993700
2016-04-04 mmm 167.110001 167.490005 165.919998 166.399994 2022800
2016-04-05 mmm 165.179993 166.550003 164.649994 165.809998 1610300
2016-04-06 mmm 165.339996 167.080002 164.839996 166.809998 2092200
2016-04-07 mmm 165.880005 167.229996 165.250000 167.160004 2721900"""

df = pd.read_csv(StringIO(text), delim_whitespace=1, parse_dates=[0], index_col=0)


And the code

import pandas as pd
from pandas.io.data import DataReader
import numpy as np
import time
import os

stocklist = ['vws.co','nflx','mmm']


print ('df.tail (Input df)\n',df.tail(6),'\n')


def Screener(group):

def diff_calc(group):

df['Difference'] = df['Adj_Close'].diff()
return df['Difference']

df['Difference'] = diff_calc(group)
return df

if __name__ == '__main__':

df = GetStock(stocklist, start, end)
df['Adj_Close'] = df['Adj Close']

for ticker in stocklist:
### groupby screeener (filtering to only rel ticker group)
df = df.groupby('Ticker', as_index=False).Adj_Close.apply(Screener)

df.reset_index().sort(['Ticker', 'Date'], ascending=[1,1]).set_index('Ticker')
print ('(Output df)\n',df,'\n')

# Test the first 7 rows of each group for rolling_mean transgress groups...
df_test = df.groupby('Ticker').head(7).reset_index().set_index('Date')
print ('df_test (summary from df) (Output)\n',df_test,'\n')


Apparently my index is now screwed up, and I do not know how this happened.

(Output df)
Ticker Open High Low Adj Close Adj_Close Date
0 0 0 2016-05-20 vws.co 443.00 446.30 441.40 442.90 442.90
2016-05-23 vws.co 442.00 446.70 439.90 439.90 439.90
2016-05-24 vws.co 439.10 450.00 438.10 450.00 450.00
2016-05-25 vws.co 455.50 466.10 454.30 464.90 464.90
2016-05-26 vws.co 465.00 470.80 464.60 464.60 464.60
2016-05-27 vws.co 464.00 480.70 461.20 476.00 476.00
2016-05-30 vws.co 477.00 481.80 473.10 475.00 475.00
2016-05-31 vws.co 474.00 479.30 472.20 479.00 479.00
2016-06-01 vws.co 477.40 480.20 472.90 474.40 474.40
2016-05-20 nflx 90.08 93.28 89.98 92.49 92.49
2016-05-23 nflx 92.98 95.29 92.85 94.89 94.89
2016-05-24 nflx 95.98 99.14 95.75 97.89 97.89
2016-05-25 nflx 99.00 100.31 98.30 100.20 100.20


I need the dates as my index, sorted within each ticker-group

Can anyone help?

Answer

OK finally I found a solution. This question helped me get the index back to how I wanted it. How to get rows in pandas data frame, with maximal values in a column and keep the original index?

Below code will rid me of the unwanted pr. iteration added col in the index. Thanks guys!

import pandas as pd
from pandas.io.data import DataReader
import numpy as np
import time
import os
from io import StringIO

text = """Date   Ticker        Open        High         Low   Adj_Close   Volume
    2015-04-09  vws.co  315.000000  316.100000  312.500000  311.520000  1686800
    2015-04-10  vws.co  317.000000  319.700000  316.400000  312.700000  1396500
    2015-04-13  vws.co  317.900000  321.500000  315.200000  315.850000  1564500
    2015-04-14  vws.co  320.000000  322.400000  318.700000  314.870000  1370600
    2015-04-15  vws.co  320.000000  321.500000  319.200000  316.150000   945000
    2015-04-16  vws.co  319.000000  320.200000  310.400000  307.870000  2236100
    2015-04-17  vws.co  309.900000  310.000000  302.500000  299.100000  2711900
    2015-04-20  vws.co  303.000000  312.000000  303.000000  306.490000  1629700
    2016-03-31     mmm  166.750000  167.500000  166.500000  166.630005  1762800
    2016-04-01     mmm  165.630005  167.740005  164.789993  167.529999  1993700
    2016-04-04     mmm  167.110001  167.490005  165.919998  166.399994  2022800
    2016-04-05     mmm  165.179993  166.550003  164.649994  165.809998  1610300
    2016-04-06     mmm  165.339996  167.080002  164.839996  166.809998  2092200
    2016-04-07     mmm  165.880005  167.229996  165.250000  167.160004  2721900"""

df = pd.read_csv(StringIO(text), delim_whitespace=1, parse_dates=[0], index_col=0)
runstart = time.time()     # Start script timer

stocklist = ['vws.co','nflx','mmm']#,'msft','tsla']
tickers =   []

def Screener(group):

    def diff_calc(group):

        df['Difference'] = df['Adj_Close'].diff()
        return df['Difference']

    df['Difference'] = diff_calc(group)
    return df

if __name__ == '__main__':

    for ticker in stocklist:
        ### groupby screeener (filtering to only rel ticker group)
        df = df.groupby('Ticker', as_index=False).Adj_Close.apply(Screener) #.reset_index()

        df = df.reset_index(level=0, drop=True)

    print ('(Output df)\n',df,'\n')

    # Test the first 7 rows of each group for rolling_mean transgress groups...
    df_test = df.groupby('Ticker').head(7).reset_index().set_index('Date')
    print ('df_test (summary from df) (Output)\n',df_test,'\n')
Comments