Dhyerwolf Dhyerwolf - 4 months ago 24
Python Question

Convert all columns in a Dataframe to a single secondary index in a MultiIndex Dataframe

I have a dataframe called Pop that looks like the following (Sorry, this doesn't format very well, but it's a single Index DataFrame (State is the index) with columns for each year. I only put a small sample of the data):

2007 2008 2009
State
Alabama 4637904 4677464 4708708


Is there a way to easily convert this to a MultiIndex DataFrame where the Year becomes a secondary index. The only item is a new field 'Population' that takes the data from the original dataFrame. It would ideally look something like:

Population
State Year

Alabama 2007 4637904

Alabama 2008 4677464

Alabama 2009 4708708


etc...

I've been looking for some kind of code since this seems like something that could be handled in Python. Thank you!

3kt 3kt
Answer

You can use melt to convert columns into rows :

In [10]: df
Out[10]: 
     state      2007      2008      2009
0  Alabama   4637904   4677464   4708708
1    Maine   1234567   1432765   1432567
2  Florida  19432765  19123456  19765654

In [13]: out = pd.melt(df, id_vars=['state'], var_name=['year'], value_name='population')

In [14]: out
Out[14]: 
     state  year  population
0  Alabama  2007     4637904
1    Maine  2007     1234567
2  Florida  2007    19432765
3  Alabama  2008     4677464
4    Maine  2008     1432765
5  Florida  2008    19123456
6  Alabama  2009     4708708
7    Maine  2009     1432567
8  Florida  2009    19765654

And finally setting the index :

In [15]: out.set_index(['state', 'year'])
Out[15]: 
              population
state   year            
Alabama 2007     4637904
Maine   2007     1234567
Florida 2007    19432765
Alabama 2008     4677464
Maine   2008     1432765
Florida 2008    19123456
Alabama 2009     4708708
Maine   2009     1432567
Florida 2009    19765654

PS : I used random values for the states population