CF84 CF84 - 2 months ago 7
Python Question

Pandas: adding new column to existing Data Frame for grouping purposes

I have a

pandas
Data Frame consisting of 2000 rows x 8 columns. I want to be able to group the first 4 columns together, as well as the other 4, but I can't figure out how. The purpose is to create a categorical bar plot, with colors assigned according to C1=C5, C2=C6, and so forth.

My Data Frame:

In[1]: df.head(5)
Out[1]:

C1 C2 C3 C4 C5 C6 C7 C8
0 15 37 17 10 8 11 19 86
1 39 84 11 5 5 13 9 11
2 10 20 30 51 74 62 56 58
3 88 2 1 3 9 6 0 17
4 17 17 32 24 91 45 63 48


Do you suggest adding another column such as
df['Gr']
or what else?

Answer

You can use MultiIndex.from_arrays:

df.columns = pd.MultiIndex.from_arrays([['a'] * 4 + ['b'] * 4 , df.columns])
print (df)
    a               b            
   C1  C2  C3  C4  C5  C6  C7  C8
0  15  37  17  10   8  11  19  86
1  39  84  11   5   5  13   9  11
2  10  20  30  51  74  62  56  58
3  88   2   1   3   9   6   0  17
4  17  17  32  24  91  45  63  48

Then you can use xs and DataFrame.plot.bar:

import matplotlib.pyplot as plt

f, a = plt.subplots(2,1)
df.xs('a', axis=1).plot.bar(ax=a[0])
df.xs('b', axis=1).plot.bar(ax=a[1])
plt.show()

graph


import matplotlib.pyplot as plt

df.columns = pd.MultiIndex.from_arrays([['a'] * 4 + ['b'] * 4 , df.columns])
df.stack(0).T.plot.bar(rot='0', legend=False)

df.columns = ['a'] * 4 + ['b'] * 4
df = df.T.plot.bar(rot='0')

plt.show()
Comments