Noobie Noobie - 3 months ago 14
Python Question

Tricky wide to long conversion in Pandas

I have a dataframe that looks like

stock date type1 type2 volume_A qtit_A volume_B qtit_B
'ABC' '2013-01-01' 1 2 1000 5 2500 6
'ABC' '2013-01-02' 1 3 4000 10 2500 0


and I would like to reshape it as follows:

stock date type1 type2 volume qtit type
'ABC' '2013-01-01' 1 2 1000 5 A
'ABC' '2013-01-01' 1 2 2500 6 B
'ABC' '2013-01-02' 1 3 4000 10 A
'ABC' '2013-01-02' 1 3 2500 0 B


where you can see that the columns
['volume_A','qtit_A','volume_B','qtit_B']
are broken down in
['volume','qtit']
with a type indicator to remember which type of volume/price we are looking at.

I am struggling to have that done in Pandas in a clean way (using
melt
or
stack()
for instance)

Any ideas?
Thanks!

Answer

If you set ['date','stock','type1','type2'] as the index, then you can split the remaining column labels on '_', create a MultiIndex from these tuples, and then move the A,B labels into the index using stack. reset_index then produces the desired result by moving the index levels back into columns.

import pandas as pd

df = pd.DataFrame({'date': ['2013-01-01', '2013-01-02'],
 'qtit_A': [5, 10],
 'qtit_B': [6, 0],
 'stock': ['ABC', 'ABC'],
 'type1': [1, 1],
 'type2': [2, 3],
 'volume_A': [1000, 4000],
 'volume_B': [2500, 2500]})

df = df.set_index(['date','stock','type1','type2'])
df.columns = pd.MultiIndex.from_tuples([col.split('_', 1) for col in df.columns])
result = df.stack(level=1).reset_index()
result = result.rename(columns={'level_4':'type'})
print(result

yields

         date stock  type1  type2 type  qtit  volume
0  2013-01-01   ABC      1      2    A     5    1000
1  2013-01-01   ABC      1      2    B     6    2500
2  2013-01-02   ABC      1      3    A    10    4000
3  2013-01-02   ABC      1      3    B     0    2500