Stacey Stacey - 7 months ago 22
Python Question

Return data between two dates

I have some code which parses through a number of .CVS files, retrieves all the data for several columns and places the data in a data frame (called dfs). I am now trying to return all of the data fields in dfs that are between two dates only.

I am trying to use the command:

return dfs[(dfs['date'] >= startDate) & (dfs['date'] <= endDate)]


but get the following error:

KeyError: 'date'


Could someone let me know what I've done wrong please?
Please see below for my code:

def getTimeseriesData(path,column_num,startDate,endDate):
colNames = ['date']
dfs = []

allfiles = glob.glob(os.path.join(path, "*.csv"))
for fname in allfiles:

name = os.path.splitext(fname)[0]
name = os.path.split(name)[1]
colNames.append(name)

df = pd.read_csv(fname, header=None, usecols=[0, column_num,4,5],
parse_dates=[0], dayfirst=True,
index_col=[0], names=['date', name+'_LAST',name+'_VOLUME',name+'_MKTCAP'])

df = df.groupby(level=0).agg('mean')

dfs.append(df)

dfs = pd.concat(dfs, axis=1)

return dfs[(dfs['date'] >= startDate) & (dfs['date'] <= endDate)] #<<--I think this is the problem


the head of dfs (from which I want to return data between two dates (say between 2001-01-03 and 2001-01-05) looks like this:

BBG.XLON.BTA.S_LAST BBG.XLON.BTA.S_VOLUME BBG.XLON.BTA.S_MKTCAP \
date
2001-01-02 572 26605510 37494.60
2001-01-03 560 24715470 36708.00
2001-01-04 613 52781855 40182.15
2001-01-05 630 56600152 41296.50
2001-01-08 633 41014402 41493.15

BBG.XLON.VOD.S_LAST BBG.XLON.VOD.S_VOLUME BBG.XLON.VOD.S_MKTCAP
date
2001-01-02 NaN NaN NaN
2001-01-03 225.00 444328736 145216.0020
2001-01-04 239.00 488568000 154251.6643
2001-01-05 242.25 237936704 156349.2288
2001-01-08 227.75 658059776 146990.8642

Answer

Here date is the name of your index and not a column name:

Change:

return dfs[(dfs['date'] >= startDate) & (dfs['date'] <= endDate)] 

into:

return dfs[(dfs.index >= startDate) & (dfs.index <= endDate)]