Gil Baggio Gil Baggio - 5 months ago 8
Python Question

I want to change the format of my DataFrame in pandas. How can I do this?

My current dataframe looks like this:

In [40]: result
Out[40]:
monthyear Facility Date Yield Planned
0 Dec 15 CCM2 2015-12-01 2550.000000 0.0
1 Feb 16 CCM2 2016-02-01 4250.000000 0.0
2 Jan 16 CCM1 2016-01-01 1540.000000 0.0
3 Jan 16 CCM2 2016-01-01 6800.000000 0.0
4 Nov 15 CCM1 2015-11-01 921.458157 880.0
5 Nov 15 CCM2 2015-11-01 1750.310038 3000.0
6 Sep 15 CCM2 2015-09-01 5191.197065 8000.0


I need to change its format to the following:

monthyear Date CCM1 CCM2 Planned
0 Dec 15 2015-12-01 0.000000 2550.000000 0.0
1 Feb 16 2016-02-01 0.000000 4250.000000 0.0
2 Jan 16 2016-01-01 1540.000000 6800.000000 0.0
3 Nov 15 2015-11-01 921.458157 1750.310038 3880.0
4 Sep 15 2015-09-01 0.000000 5191.197065 8000.0


How can I do this?? Thanks in advance.

Answer

Use pivot_table and then groupby by index with aggregating sum:

df1 = (df.pivot_table(index=['monthyear','Date','Planned'], 
                      columns='Facility', 
                      values='Yield',
                      fill_value=0).reset_index(level=2))
print (df1)

Facility              Planned         CCM1         CCM2
monthyear Date                                         
Dec 15    2015-12-01      0.0     0.000000  2550.000000
Feb 16    2016-02-01      0.0     0.000000  4250.000000
Jan 16    2016-01-01      0.0  1540.000000  6800.000000
Nov 15    2015-11-01    880.0   921.458157     0.000000
          2015-11-01   3000.0     0.000000  1750.310038
Sep 15    2015-09-01   8000.0     0.000000  5191.197065

print (df1.groupby(level=[0,1]).sum())

Facility              Planned         CCM1         CCM2
monthyear Date                                         
Dec 15    2015-12-01      0.0     0.000000  2550.000000
Feb 16    2016-02-01      0.0     0.000000  4250.000000
Jan 16    2016-01-01      0.0  1540.000000  6800.000000
Nov 15    2015-11-01   3880.0   921.458157  1750.310038
Sep 15    2015-09-01   8000.0     0.000000  5191.197065

Last use rename_axis (new in pandas 0.18.0):

print (df1.groupby(level=[0,1]).sum().reset_index().rename_axis(None, axis=1))
  monthyear        Date  Planned         CCM1         CCM2
0    Dec 15  2015-12-01      0.0     0.000000  2550.000000
1    Feb 16  2016-02-01      0.0     0.000000  4250.000000
2    Jan 16  2016-01-01      0.0  1540.000000  6800.000000
3    Nov 15  2015-11-01   3880.0   921.458157  1750.310038
4    Sep 15  2015-09-01   8000.0     0.000000  5191.197065