KHibma KHibma - 24 days ago 8
Python Question

Add missing dates to pandas dataframe

[I've found quite of bit of talk on similar issues, but nothing on how I'm approaching this]

I have a CSV file I'm parsing which could have multiple events on a given date. Sometimes there are NO events on a date. I take these events, get a count by date and plot them.

However, when I plot them, my two series dont always match.

df = pd.read_csv(inFile, parse_dates=True)

idx = pd.date_range(df['simpleDate'].min(), df['simpleDate'].max())
s = df.groupby(['simpleDate']).size()

fig, ax = plt.subplots()
ax.bar(idx.to_pydatetime(), s, color='green')


In the above code idx becomes a range of say 30 dates. 09-01-2013 to 09-30-2013
However S may only have 25 or 26 days because no events happened for a given date. I then get an AssertionError as the sizes dont match.

What's the proper way to tackle this?
Do I want to remove dates with no values from IDX or (which I'd rather do) is add to the series the missing date with a count of 0. I'd rather have a full graph of 30 days with 0 values. If this approach is right, any suggestions on how to get started? Do I need some sort of dynamic
reindex
function?

Here's a snippet of S (
df.groupby(['simpleDate']).size()
), notice no entries for 04 and 05.

09-02-2013 2
09-03-2013 10
09-06-2013 5
09-07-2013 1

Answer

You could use Series.reindex:

import pandas as pd

idx = pd.date_range('09-01-2013', '09-30-2013')

s = pd.Series({'09-02-2013': 2,
               '09-03-2013': 10,
               '09-06-2013': 5,
               '09-07-2013': 1})
s.index = pd.DatetimeIndex(s.index)

s = s.reindex(idx, fill_value=0)
print(s)

yields

2013-09-01     0
2013-09-02     2
2013-09-03    10
2013-09-04     0
2013-09-05     0
2013-09-06     5
2013-09-07     1
2013-09-08     0
...