megadarkfriend megadarkfriend - 28 days ago 22
Python Question

reindex sorted pandas dataframe

I have a dataframe like so:

Column A Column B Date Value
1 A 1 2011-01-01 10
2 B 1 2011-01-01 10
3 A 2 2011-01-01 10
4 B 2 2011-01-01 10
5 A 1 2011-01-02 10
6 B 1 2011-01-02 10
7 A 2 2011-01-02 10
8 B 2 2011-01-02 10
9 A 1 2011-01-03 10
10 B 1 2011-01-03 10
11 B 2 2011-01-03 10


I want to find missing dates for every value of A and B (in this case, it would be A, date: 2011-01-03), and insert NaN there. I tried the reindex function:

df.sort_values(['Column A','Column B'],ascending = [True,True], inplace = True)
df.index = range(1,len(df)+1)
dates = pd.date_range('2011-01-01','2011-01-03')
df = df.reindex(dates, fill_value = None)
print df


But it gives me NaN in every column. Does anyone have any suggestions as to how I can flag these missing values?

Answer

Try this:

df.set_index(['Column A', 'Column B', 'Date']).unstack().stack(dropna=False).sort_index(level=[2, 0]).reset_index()

   Column A  Column B        Date  Value
0         A         1  2011-01-01   10.0
1         B         1  2011-01-01   10.0
2         A         2  2011-01-01   10.0
3         B         2  2011-01-01   10.0
4         A         1  2011-01-02   10.0
5         B         1  2011-01-02   10.0
6         A         2  2011-01-02   10.0
7         B         2  2011-01-02   10.0
8         A         1  2011-01-03   10.0
9         B         1  2011-01-03   10.0
10        A         2  2011-01-03    NaN
11        B         2  2011-01-03   10.0

Note: row 10 has the missing value NaN