user99889 user99889 - 21 days ago 8
Python Question

pandas get_dummies with identical/ same column names

I have

In [122]: d=pandas.DataFrame({'d_1':['a','x'],'d_2':['x','y']})

In [123]: d
Out[123]:
d_1 d_2
0 a x
1 x y


I want:

a x y
0 1 1 0
1 0 1 1


I do not want to use

In [139]: pandas.get_dummies(d)
Out[139]:
d_1_a d_1_x d_2_x d_2_y
0 1.0 0.0 1.0 0.0
1 0.0 1.0 0.0 1.0


Because d_1_x and d_2_x are considered distinct by this function, which requires too much memory for my application.

I do however want to use get_dummies because it is fast; so, I tried to rename the columns and apply get_dummies

In [124]: d.columns=['d' for el in d.columns]

In [141]: d
Out[141]:
d d
0 a x
1 x y

In [151]: pandas.get_dummies(d)
Out[151]:
d_('d',) d_('d',)
0 1.0 1.0
1 1.0 1.0

Answer

You can try something like this:

import pandas as pd
d.apply(lambda x: pd.Series(1, x), 1).fillna(0)

#     a   x   y
#0  1.0 1.0 0.0
#1  0.0 1.0 1.0