HappyPy HappyPy -3 years ago 199
Python Question

reshaping data frame in pandas

Let's say I have this data frame:

df = pd.DataFrame({'n':[0 ,1 ,0 ,0 ,1 ,1 ,0 ,1],'l':[12 ,16 ,92, 77 ,32 ,47, 22, 14], 'cols':['col1','col1','col1','col1','col2','col2','col2','col2']})


and this is what I'm trying to get:

col1 col2
l n l n
12 0 32 1
16 1 47 1
92 0 22 0
77 0 14 1


I've been playing around with
set_index
and
stack
/
unstack
methods but with no success...

Answer Source
import pandas as pd

df = pd.DataFrame(
    {'n':[0 ,1 ,0 ,0 ,1 ,1 ,0 ,1],'l':[12 ,16 ,92, 77 ,32 ,47, 22, 14],
     'cols':['col1','col1','col1','col1','col2','col2','col2','col2']})

df['index'] = df.groupby(['cols']).cumcount()
result = df.pivot(index='index', columns='cols')
print(result)
#           l           n      
# cols   col1  col2  col1  col2
# index                        
# 0        12    32     0     1
# 1        16    47     1     1
# 2        92    22     0     0
# 3        77    14     0     1

If you care about the order of the labels in the MultiIndex column, you could use stack and unstack to exactly reproduce result you posted:

result = result.stack(level=0).unstack(level=1)
print(result)

# cols   col1     col2   
#           l  n     l  n
# index                  
# 0        12  0    32  1
# 1        16  1    47  1
# 2        92  0    22  0
# 3        77  0    14  1

When looking for a solution it is often useful to think backwards.

Start with the desired DataFrame and ask yourself what operation might result in the desired DataFrame. In this case, the operation that came to mind was pd.pivot. Then the question becomes, what DataFrame, something, is needed so that

desired = something.pivot(index='index', columns='cols') 

By looking at other examples of pivot in action, it became clear than something had to equal

   cols   l  n  index
0  col1  12  0      0
1  col1  16  1      1
2  col1  92  0      2
3  col1  77  0      3
4  col2  32  1      0
5  col2  47  1      1
6  col2  22  0      2
7  col2  14  1      3

Then you see if you can find a way to massage df into something, or again working backwards, massage something into df... From this point of view, in this case, the missing link became apparent: something has an index column that df lacked.

Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download