Rijul Magu Rijul Magu - 3 years ago 161
Python Question

How to one-hot-encode from a pandas column containing a list?

I would like to break down a pandas column consisting of a list of elements into as many columns as there are unique elements i.e.

one-hot-encode
them (with value
1
representing a given element existing in a row and
0
in the case of absence).

For example, taking dataframe df

Col1 Col2 Col3
C 33 [Apple, Orange, Banana]
A 2.5 [Apple, Grape]
B 42 [Banana]


I would like to convert this to:

df

Col1 Col2 Apple Orange Banana Grape
C 33 1 1 1 0
A 2.5 1 0 0 1
B 42 0 0 1 0


How can I use pandas/sklearn to achieve this?

Answer Source

We can also use sklearn.preprocessing.MultiLabelBinarizer:

from sklearn.preprocessing import MultiLabelBinarizer

mlb = MultiLabelBinarizer()
df = df.join(pd.DataFrame(mlb.fit_transform(df.pop('Col3')),
                          columns=mlb.classes_,
                          index=df.index))

Result:

In [77]: df
Out[77]:
  Col1  Col2  Apple  Banana  Grape  Orange
0    C  33.0      1       1      0       1
1    A   2.5      1       0      1       0
2    B  42.0      0       1      0       0
Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download