ZacAttack ZacAttack - 2 months ago 4
Python Question

How do I sum the rows of my dataframe so it only sums values based on the month, day, or year. Then form a report with all the results

I am analyzing stock data from yahoo finance, I currently have my DataFrame = Df
filtered to show only the days for the month of march, since 1991. I want to be able to find out what the monthly returns are for march by year. Or any other combination of months i.e. what is the returns from January to March since 1991. I would also like that broken down by year.

I would also like to be able to do the day as well i.e. how much has apples stock changed on all the Fridays since 1991. This would be another sample question

I am trying to get this to where I can print an actual paper copy that breaks it all down by year; like a report.

I have tried reading the multindexing tutorials and group by on pandas.pydata.org/ but it is very confusing, and I am not for sure if this
is what I need.

This is my current code

from pandas_datareader import data as dreader
import pandas as pd
from datetime import datetime
import dateutil.parser
from tkinter import *


# Sets the max rows that can be displayed
# when the program is executed
pd.options.display.max_rows = 120



# df is the name of the dataframe, it is
# reading the csv file containing data loaded
# from yahoo finance(Date,Open,High,Low,Close
# volume,adj close,)the name of the ticker
# is placed before _data.csv i.e. the ticker aapl
# would have a csv file named aapl_data.csv.
df = pd.read_csv("cde_data.csv")


# resets the index back to the pandas default
# i.e. index starts at 0 for the first row and
# 1 for the second and continues by one till the
# end of the data in the above csv file.
df.reset_index()



# the following code will allow for filtering of the datafram
# based on the year, day of week (dow), and month. It then gets
# applied to the dataframe and then can be used to sort data i.e
# print(df[(df.year == 2015) & (df.month == 5) & (df.dow == 4)])
# which will give you all the days in the month of May(df.month == 5),
# that fall on a Thursday(df.dow == 4), in the year 2015
# (df.year == 2015)
#
# Month Dow Year
# January = 1 Monday = 1 The year will be dispaly in a four
# February = 2 Tuesday = 2 digit format i.e. 2015
# March = 3 Wednesday = 3
# April = 4 Thursday = 4
# May = 5 Friday = 5
# June = 6
# July = 7
# August = 8
# September = 9
# October = 10
# November = 11
# December = 12
def year(x):
return(x.year)
def dow(x):
return(x.isoweekday())
def month(x):
return(x.month)
df.Date = df.Date.apply(dateutil.parser.parse)
df['year'] = df.Date.apply(year)
df['dow'] = df.Date.apply(dow)
df['month'] = df.Date.apply(month)


# The code below has a total of five sections all labeled by number.
# They are #1, #2, #3, #4, #5. Number one adds new columns to the df
# and populates them with data, number two filters out all the days
# that the market went down or flat for the day, number three filters
# out all of the days that the market went up or flat, number four
# filters all of the days that the market went up or down, and
# number five drops the excess columns and concats steps #2, #3, & #4.


# 1
# there are five columns that are being added, up_down, up, down,
# flat, and %chg. up, down, and flat are temporary and will be
# deleted later on the other two up_down, and %chg will be permeant.
# The up_down column is derived from taking the 'close' column minus the
# 'open'column, this tells you how much the stock has moved for the day.
# The 'up' column is temporary and has a value of 'up' for all the rows
# of the DataFrame df. The 'down' column is temporary and has a value of
# 'down' for all the rows of the DataFrame df. The 'down' column is
# temporary and has a value of 'flat' for all the rows of the DataFrame
# df. The '%chg' column is calculated by taking the results of the
# 'up_down' divided by the 'close' column, and then times 100, which
# turns it into a percentage show what percent the stock moved up or
# down for the day. All of the columns added below are added to the
# DataFrame called df, which contains a a csv file(see code lines 14-20
# for information on the csv file contained in the DataFrame df).

df['up'] = 'up'
df['down'] = 'down'
df['flat'] = 'flat'
df['up_down'] = df['Close'] - df['Open']
df['%chg'] = ((df['up_down']/df['Close'])*100)


# 2
# df column[up_down] is first filtered on the greater than zero
# criteria from the year 1984 on up and then is turned into df2.
# If the up_down column is greater than zero than this means that
# the stock went up. Next df3 is set = to df2['up'], df3 now holds
# just the days where the asset went up
df2= (df[(df.year > 1984) & (df.up_down > 0)])
df3 = df2['up']



# 3
# df column[up_down] is first filtered on the less than zero
# criteria from the year 1984 on up and then is turned into df4.
# If the up_down column is less than zero than this means that
# the stock went Down. Next df5 is set = to df4['down'], df5 now holds
# just the days where the asset went down
df4= (df[(df.year > 1984) & (df.up_down < 0)])
df5 = df4['down']



# 4
# df column[up_down] is first filtered on the equal to zero
# criteria from the year 1984 on up and then is turned into df6.
# If the up_down column is equal to zero than this means that
# the stock did not move. Next df7 is set = to df6['flat'],df5
# now holds just the days where the asset did not move at all
df6= (df[(df.year > 1984) & (df.up_down == 0)])
df7 = df6['flat']



# 5
# The code below starts by droping the columns 'up', 'down', and 'flat'.
# These were temporary and were used to help filter data in the above
# code in sections two, three, and four. Finally we concat the
# DataFrames df, df3, df5, and df7. We now have new 'up', 'down' and
# 'flat' columns that only display up, down, or flat when the criteria
# is true.
df = df.drop(['up'], axis = 1)
df = df.drop(['down'], axis = 1)
df = df.drop(['flat'], axis = 1)
df = pd.concat([df,df3,df5,df7],axis =1, join_axes=[df.index])


# The difference between the close of current day and the previous day
# non percentage
df['Up_Down'] = df.Close.diff()


# The percentage of change on the Up_Down column
df['%Chg'] = ((df['up_down']/df['Close'])*100)

# How much the current opening price has moved up from the previous
# opening price in terms of percentage,
df['Open%pd'] = df.Open.pct_change()*100


# How much the current high price has moved up from the previous high
# price in terms of percentage.
df['High%pd'] = df.High.pct_change()*100


# How much the current low price has moved up from the previous low
# price in terms of percentage
df['Low%pd'] = df.Low.pct_change()*100


# How much the current close price has moved up from the previous close
# price in terms of percentage
df['Close%pd'] = df.Close.pct_change()*100


# How much the current volume price has moved up from the previous days
# volume in terms of percetage
df['Volume%pd'] = df.Volume.pct_change()*100


# Both columns take the percentage of change from open to high and open
# to low
df['High%fo'] = ((df.High - df.Open)/(df.Open))*100
df['Low%fo'] = ((df.Open - df.Low) / (df.Open))*100


# Takes the difference from the high price and the low price non
# percentage
df['HighLowRange'] = df.High - df.Low


# Measures how much the range the high minus low has changed verses the
# previous day
df['HighLowRange%pd'] = df.HighLowRange.pct_change()*100


# df now is equal to only the months of March and only has the date and
# Close%pd column
df=df[['Date','Close%pd']][(df.month == 3)]


print(df)


This is what df prints (All the days of March since 1991) For an example

Date Close%pd
223 1991-03-01 2.097902
224 1991-03-04 1.369863
225 1991-03-05 -2.702703
226 1991-03-06 1.388889
227 1991-03-07 0.000000
228 1991-03-08 6.164384
229 1991-03-11 -4.516129
230 1991-03-12 0.675676
231 1991-03-13 2.684564
232 1991-03-14 -2.614379
233 1991-03-15 -1.342282
234 1991-03-18 -7.482993
235 1991-03-19 0.000000
236 1991-03-20 0.735294
237 1991-03-21 0.000000
238 1991-03-22 0.000000
239 1991-03-25 -0.729927
240 1991-03-26 0.000000
241 1991-03-27 0.735294
242 1991-03-28 0.000000
476 1992-03-02 0.000000
477 1992-03-03 0.000000
478 1992-03-04 0.000000
479 1992-03-05 0.000000
480 1992-03-06 0.000000
481 1992-03-09 -3.174603
482 1992-03-10 2.459016
483 1992-03-11 0.000000
484 1992-03-12 -1.600000
485 1992-03-13 0.813008
486 1992-03-16 -1.612903
487 1992-03-17 -1.639344
488 1992-03-18 -2.500000
489 1992-03-19 1.709402
490 1992-03-20 -1.680672
491 1992-03-23 -1.709402
492 1992-03-24 -1.739130
493 1992-03-25 2.654867
494 1992-03-26 -0.862069
495 1992-03-27 4.347826
496 1992-03-30 -1.666667
497 1992-03-31 -1.694915
728 1993-03-01 2.000000
729 1993-03-02 0.980392
730 1993-03-03 0.000000
731 1993-03-04 1.941748
732 1993-03-05 1.904762
733 1993-03-08 1.869159
734 1993-03-09 0.000000
735 1993-03-10 0.000000
736 1993-03-11 -1.834862
737 1993-03-12 3.738318
738 1993-03-15 4.504505
739 1993-03-16 -1.724138
740 1993-03-17 0.000000
741 1993-03-18 2.631579
742 1993-03-19 0.000000
743 1993-03-22 5.128205
744 1993-03-23 0.000000
745 1993-03-24 2.439024
... ... ...
6023 2014-03-10 -3.372835
6024 2014-03-11 -0.943396
6025 2014-03-12 2.761905
6026 2014-03-13 -1.019462
6027 2014-03-14 1.029963
6028 2014-03-17 -1.853568
6029 2014-03-18 4.815864
6030 2014-03-19 -2.792793
6031 2014-03-20 1.297498
6032 2014-03-21 -0.548948
6033 2014-03-24 -7.083717
6034 2014-03-25 0.198020
6035 2014-03-26 -8.003953
6036 2014-03-27 0.214823
6037 2014-03-28 3.215434
6038 2014-03-31 -3.530633
6269 2015-03-02 2.226027
6270 2015-03-03 0.670017
6271 2015-03-04 -4.991681
6272 2015-03-05 -2.101576
6273 2015-03-06 -10.017889
6274 2015-03-09 -5.367793
6275 2015-03-10 -6.722689
6276 2015-03-11 4.504505
6277 2015-03-12 1.293103
6278 2015-03-13 1.063830
6279 2015-03-16 6.315789
6280 2015-03-17 -5.544554
6281 2015-03-18 8.805031
6282 2015-03-19 -3.853565
6283 2015-03-20 7.014028
6284 2015-03-23 0.561798
6285 2015-03-24 1.862197
6286 2015-03-25 -3.473492
6287 2015-03-26 -1.325758
6288 2015-03-27 -3.262956
6289 2015-03-30 -3.968254
6290 2015-03-31 -2.685950
6521 2016-03-01 -1.295337
6522 2016-03-02 4.986877
6523 2016-03-03 12.250000
6524 2016-03-04 0.668151
6525 2016-03-07 11.061947
6526 2016-03-08 -7.370518
6527 2016-03-09 1.720430
6528 2016-03-10 3.382664
6529 2016-03-11 2.862986
6530 2016-03-14 -2.783300
6531 2016-03-15 -1.226994
6532 2016-03-16 9.730849
6533 2016-03-17 3.207547
6534 2016-03-18 2.193784
6535 2016-03-21 3.041145
6536 2016-03-22 0.694444
6537 2016-03-23 -9.482759
6538 2016-03-24 0.571429
6539 2016-03-28 3.030303
6540 2016-03-29 4.963235
6541 2016-03-30 -1.050788
6542 2016-03-31 -0.530973

[568 rows x 2 columns]
Press any key to continue . . .

Answer

As in the comments, you can find monthly totals using groupby:

#change the previous last line of code to this
df=df[['Date','year','month','Close%pd']][(df.month == 3)]

#make a new dataframe
new_df = df.groupby(['year','month']).sum()

An alternative method would be to use the resample command docs. This is probably the best way to go for weekly totals especially since you don't have a variable indicating the 'week of the year' which is what you would pass into groupby.

df = df.resample('W', how='sum') #weekly totals
df = df.resample('M', how='sum') #monthly totals