Richard Richard - 1 month ago 15
Python Question

How to create hierarchical columns in pandas?

I have a pandas dataframe that looks like this:

rank_2015 num_2015 rank_2014 num_2014 .... num_2008
France 8 1200 9 1216 .... 1171
Italy 11 789 6 788 .... 654


Now I want to draw a bar chart of the sums just the
num_
columns, by year. So on the x-axis I would like years from 2008 to 2015, and on the y-axis I would like the sum of the related
num_
column.

What's the best way to do this? I know how to get the sums for each column:

df.sum()


But what I don't know is how to chart only the
num_
columns, and also how to re-label those columns so that the labels are integers rather than strings, in order to get them to chart correctly.

I'm wondering if I want to create hierarchical columns, like this:

rank num
2015 2014 2015 2014 .... 2008
France 8 9 1200 1216 .... 1171
Italy 11 6 789 788 .... 654


Then I could just chart the columns in the
num
section.

How can I get my dataframe into this shape?

Answer

You could use str.extract with the regex pattern (.+)_(\d+) to convert the columns to a DataFrame:

cols = df.columns.str.extract(r'(.+)_(\d+)', expand=True)
#       0     1
# 0   num  2008
# 1   num  2014
# 2   num  2015
# 3  rank  2014
# 4  rank  2015

You can then build a hierarchical (MultiIndex) index from cols and reassign it to df.columns:

df.columns = pd.MultiIndex.from_arrays((cols[0], cols[1]))

so that df becomes

         num             rank     
        2008  2014  2015 2014 2015
France  1171  1216  1200    9    8
Italy    654   788   789    6   11

import pandas as pd
import matplotlib.pyplot as plt

df = pd.DataFrame({ 'num_2008': [1171, 654],
 'num_2014': [1216, 788],
 'num_2015': [1200, 789],
 'rank_2014': [9, 6],
 'rank_2015': [8, 11]}, index=['France', 'Italy'])


cols = df.columns.str.extract(r'(.+)_(\d+)', expand=True)
cols[1] = pd.to_numeric(cols[1])
df.columns = pd.MultiIndex.from_arrays((cols[0], cols[1]))
df.columns.names = [None]*2

df['num'].sum().plot(kind='bar')
plt.show()

enter image description here