ShanZhengYang ShanZhengYang - 2 months ago 12
Python Question

How to pivot pandas DataFrame column to create binary "value table"?

I have the following pandas dataframe:

import pandas as pd
df = pd.read_csv("filename.csv")

df
A B C D E
0 a 0.469112 -0.282863 -1.509059 cat
1 c -1.135632 1.212112 -0.173215 dog
2 e 0.119209 -1.044236 -0.861849 dog
3 f -2.104569 -0.494929 1.071804 bird
4 g -2.224569 -0.724929 2.234213 elephant
...


I would like to create more columns based on the identity of categorical values in
column E
such that the dataframe looks like this:

df
A B C D cat dog bird elephant ....
0 a 0.469112 -0.282863 -1.509059 -1 0 0 0
1 c -1.135632 1.212112 -0.173215 0 -1 0 0
2 e 0.119209 -1.044236 -0.861849 0 -1 0 0
3 f -2.104569 -0.494929 1.071804 0 0 -1 0
4 g -2.224569 -0.724929 2.234213 0 0 0 0
...


That is, I pivot the values for column
E
to be a binary matrix based on the values of
E
, giving
1
if the value exists, and
0
for all others where it doesn't (here, I would like it to be
-1
or a "negative binary matrix")?

I'm not sure which function in pandas best does this: maybe
pandas.DataFrame.unstack()
?

Any insight appreciated!

Answer

use pd.concat, drop, and get_dummies

pd.concat([df.drop('E', 1), pd.get_dummies(df.E).mul(-1)], axis=1)

enter image description here

Comments