mkln mkln - 8 months ago 55
Python Question

Create dummies from column with multiple values in pandas

I am looking for for a pythonic way to handle the following problem.


method is great to create dummies from a categorical column of a dataframe. For example, if the column has values in
['A', 'B']
creates 2 dummy variables and assigns 0 or 1 accordingly.

Now, I need to handle this situation. A single column, let's call it 'label', has values like
['A', 'B', 'C', 'D', 'A*C', 'C*D']
creates 6 dummies, but I only want 4 of them, so that a row could have multiple 1s.

Is there a way to handle this in a pythonic way? I could only think of some step-by-step algorithm to get it, but that would not include get_dummies().

Edited, hope it is more clear!


I know it's been a while since this question was asked, but there is (at least now there is) a one-liner that is supported by the documentation:

In [4]: df
0  (a, c, e)
1     (a, d)
2       (b,)
3     (d, e)

In [5]: df['label'].str.join(sep='*').str.get_dummies(sep='*')
   a  b  c  d  e
0  1  0  1  0  1
1  1  0  0  1  0
2  0  1  0  0  0
3  0  0  0  1  1