Luis Luis - 4 months ago 12
Python Question

Stack two columns in a DataFrame, repeat others

I have a pandas DataFrame with a structure like this:

df = pd.DataFrame( [
[ 'foo1', 'a', 'z', 'bar1', 1, 4 ],
[ 'foo2', 'b', 'y', 'bar2', 2, 5 ],
[ 'foo3', 'c', 'x', 'bar3', 3, 6 ]
] )
df.columns = [ 'foo', 'let1', 'let2', 'bar', 'num1', 'num2' ]
print( df )





foo let1 let2 bar num1 num2
0 foo1 a z bar1 1 4
1 foo2 b y bar2 2 5
2 foo3 c x bar3 3 6


I want to stack the columns
let1
and
let2
, and add a label telling where they came from. The same for
num1
and
num2
. In the end, I would like to achieve this:

foo let letval bar num numval
0 foo1 let1 a bar1 num1 1
1 foo2 let1 b bar2 num1 2
2 foo3 let1 c bar3 num1 3
3 foo1 let2 z bar1 num2 4
4 foo2 let2 y bar2 num2 5
5 foo3 let2 x bar3 num2 6





So far, I've done this:

let = pd.concat( [ df.let1, df.let2 ] )
num = pd.concat( [ df.num1, df.num2 ] )
df = df.drop( ['let1', 'let2', 'num1', 'num2' ], axis=1 )
df = pd.concat( [ df, df ] )
df[ 'letval' ] = let
df[ 'numval' ] = num
print( df )

foo bar letval numval
0 foo1 bar1 a 1
1 foo2 bar2 b 2
2 foo3 bar3 c 3
0 foo1 bar1 z 4
1 foo2 bar2 y 5
2 foo3 bar3 x 6


However, I am pretty sure that there is an easier way to achieve that, without copying to dummy variables and such workarounds.

Any ideas?

Answer

Here is my attempt to combine @ayhan's solution with the pd.melt() method:

In [191]: (pd.melt(df.drop(['num1','num2'], 1), id_vars=['foo','bar'],
   .....:          var_name='let', value_name='letval')
   .....:    .assign(numval=pd.lreshape(df.filter(like='num'),
   .....:                               {'numval': ['num1', 'num2']})))
Out[191]:
    foo   bar   let letval  numval
0  foo1  bar1  let1      a       1
1  foo2  bar2  let1      b       2
2  foo3  bar3  let1      c       3
3  foo1  bar1  let2      z       4
4  foo2  bar2  let2      y       5
5  foo3  bar3  let2      x       6