Peadar Coyle Peadar Coyle - 2 months ago 17
Python Question

Reversing 'one-hot' encoding in Pandas

Problem statement
I want to go from this data frame which is basically one hot encoded.

In [2]: pd.DataFrame({"monkey":[0,1,0],"rabbit":[1,0,0],"fox":[0,0,1]})

Out[2]:
fox monkey rabbit
0 0 0 1
1 0 1 0
2 1 0 0
3 0 0 0
4 0 0 0


To this one which is 'reverse' one-hot encoded.

In [3]: pd.DataFrame({"animal":["monkey","rabbit","fox"]})
Out[3]:
animal
0 monkey
1 rabbit
2 fox


I imagine there's some sort of clever use of apply or zip to do thins but I'm not sure how... Can anyone help?

I've not had much success using indexing etc to try to solve this problem.

Answer

I would use apply to decode the columns:

In [2]: animals = pd.DataFrame({"monkey":[0,1,0,0,0],"rabbit":[1,0,0,0,0],"fox":[0,0,1,0,0]})

In [3]: def get_animal(row):
   ...:     for c in animals.columns:
   ...:         if row[c]==1:
   ...:             return c

In [4]: animals.apply(get_animal, axis=1)
Out[4]: 
0    rabbit
1    monkey
2       fox
3      None
4      None
dtype: object
Comments