user99889 - 9 months ago 104

Python Question

I have

`In [122]: d=pandas.DataFrame({'d_1':['a','x'],'d_2':['x','y']})`

In [123]: d

Out[123]:

d_1 d_2

0 a x

1 x y

I want:

`a x y`

0 1 1 0

1 0 1 1

I do not want to use

`In [139]: pandas.get_dummies(d)`

Out[139]:

d_1_a d_1_x d_2_x d_2_y

0 1.0 0.0 1.0 0.0

1 0.0 1.0 0.0 1.0

Because d_1_x and d_2_x are considered distinct by this function, which requires too much memory for my application.

I do however want to use get_dummies because it is fast; so, I tried to rename the columns and apply get_dummies

`In [124]: d.columns=['d' for el in d.columns]`

In [141]: d

Out[141]:

d d

0 a x

1 x y

In [151]: pandas.get_dummies(d)

Out[151]:

d_('d',) d_('d',)

0 1.0 1.0

1 1.0 1.0

Answer Source

You can try something like this:

```
import pandas as pd
d.apply(lambda x: pd.Series(1, x), 1).fillna(0)
# a x y
#0 1.0 1.0 0.0
#1 0.0 1.0 1.0
```