template boy template boy - 2 years ago 211
Python Question

How to calculate a rolling mean from a series of dates spanning years?

I have a CSV file containing stock data spanning 10 years. How do I find the rolling mean of those dates using Pandas? I've tried:

pd.rolling_mean(df.resample("1D", fill_method="ffill"), window=3, min_periods=1)


that I got from a previous answer. But then I get the error:

TypeError: Only valid with DatetimeIndex, TimedeltaIndex or PeriodIndex


Why does this happen and what is the correct way to do this?

Here is a bit of the CSV file data:

Date Open High Low Close Avg Vol Adj Close
Nov 1, 2012 16.90 19.16 16.86 18.77 27,813,200 18.77
Oct 1, 2012 16.00 16.89 15.65 16.84 23,074,600 16.84
Sep 4, 2012 14.64 16.20 14.59 15.98 21,900,500 15.98
Aug 1, 2012 15.86 16.37 14.59 14.65 17,527,600 14.65
Jul 2, 2012 15.80 16.17 15.23 15.84 15,348,400 15.84
Jun 1, 2012 15.04 15.83 14.81 15.83 14,102,700 15.83
May 1, 2012 15.51 16.00 14.80 15.24 19,935,500 15.24
Apr 2, 2012 15.19 15.70 14.73 15.54 16,640,200 15.54
Mar 1, 2012 14.89 15.61 14.35 15.22 15,408,200 15.22
Feb 1, 2012 15.57 16.31 14.37 14.83 22,020,900 14.83
Jan 3, 2012 16.27 16.39 15.35 15.47 18,167,300 15.47
Dec 1, 2011 16.42 16.46 14.57 16.13 21,943,000 16.13
Nov 1, 2011 14.95 16.50 14.75 15.71 24,382,900 15.71
Oct 3, 2011 13.70 16.79 13.37 15.64 36,438,000 15.64
Sep 1, 2011 13.67 15.40 12.45 13.17 42,743,300 13.17
Aug 1, 2011 13.24 13.98 11.09 13.61 33,473,700 13.61
Jul 1, 2011 15.08 15.95 13.04 13.10 29,550,000 13.10
Jun 1, 2011 16.34 16.43 14.50 15.04 25,772,600 15.04
May 2, 2011 17.79 18.84 15.63 16.55 41,113,700 16.55
Apr 1, 2011 16.83 17.77 16.06 17.70 20,728,700 17.70
Mar 1, 2011 16.46 17.70 15.81 16.68 25,018,300 16.68
Feb 1, 2011 16.33 17.84 16.04 16.40 24,578,700 16.40
Jan 3, 2011 16.81 17.34 15.41 16.12 23,201,900 16.12
Dec 1, 2010 16.00 17.22 16.00 16.63 12,970,200 16.63
Nov 1, 2010 16.50 17.60 15.77 15.82 21,166,800 15.82
Oct 1, 2010 14.19 16.76 14.13 16.49 29,726,500 16.49
Sep 1, 2010 13.20 14.53 13.13 14.17 27,527,400 14.17
Aug 2, 2010 14.01 14.52 12.94 13.11 13,811,800 13.11
Jul 1, 2010 13.99 15.60 13.52 13.88 24,186,600 13.88
Jun 1, 2010 15.31 15.84 13.79 13.84 21,523,300 13.84
May 3, 2010 16.68 16.99 14.63 15.34 26,797,300 15.34
Apr 1, 2010 16.58 19.12 16.22 16.53 33,650,200 16.53
Mar 1, 2010 15.43 16.94 15.40 16.53 21,831,600 16.53
Feb 1, 2010 15.14 15.71 14.48 15.31 21,501,000 15.31
Jan 4, 2010 16.94 17.30 14.90 15.01 24,600,000 15.01
Dec 1, 2009 15.03 16.99 14.85 16.78 18,305,700 16.78
Nov 2, 2009 15.75 16.36 14.80 14.97 20,043,500 14.97
Oct 1, 2009 17.65 18.02 15.74 15.90 25,250,700 15.90
Sep 1, 2009 14.50 17.94 13.97 17.81 36,378,300 17.81
Aug 3, 2009 14.56 15.21 14.30 14.61 28,097,800 14.61
Jul 1, 2009 15.49 17.68 14.22 14.32 36,381,400 14.32
Jun 1, 2009 16.17 16.99 14.55 15.66 19,810,700 15.66
May 1, 2009 14.50 15.84 13.91 15.84 24,369,700 15.84
Apr 1, 2009 12.70 15.39 12.60 14.29 24,663,000 14.29
Mar 2, 2009 12.85 14.14 12.31 12.81 20,620,800 12.81
Feb 2, 2009 11.50 14.00 11.49 13.23 18,837,100 13.23
Jan 2, 2009 12.17 13.56 10.81 11.73 21,408,600 11.73
Dec 1, 2008 11.82 13.57 10.50 12.20 20,089,400 12.20
Nov 3, 2008 12.74 14.89 8.94 11.51 31,084,900 11.51
Oct 1, 2008 17.17 17.31 11.25 12.82 32,771,500 12.82
Sep 2, 2008 19.63 20.82 16.88 17.30 24,333,600 17.30
Aug 1, 2008 20.09 20.82 18.87 19.38 13,981,900 19.38
Jul 1, 2008 20.48 24.80 19.59 19.89 30,010,200 19.89

Answer Source

You need to convert the Date column to an Datetime, and then set an index on it:

df['Date'] = pd.to_datetime(df['Date'])
df = df.set_index("Date")
pd.rolling_mean(df.resample("1D", fill_method="ffill"), window=3, min_periods=1)

Output like:

Date        Open   High       Low        Close      Avg         Vol
2008-07-01  20.48  24.800000  19.590000  19.890000  30.000000   10.000000
2008-07-02  20.48  24.800000  19.590000  19.890000  30.000000   10.000000
Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download