Rakesh Adk7 Rakesh Adk7 - 1 month ago 18
Python Question

How to binarize the values in a pandas DataFrame?

I have the following DataFrame:

df = pd.DataFrame(['Male','Female', 'Female', 'Unknown', 'Male'], columns = ['Gender'])


I want to convert this to a DataFrame with columns 'Male','Female' and 'Unknown' the values 0 and 1 indicated the Gender.

Gender Male Female
Male 1 0
Female 0 1
.
.
.
.


To do this, I wrote a function and called the function using map.

def isValue(x , value):
if(x == value):
return 1
else:
return 0


for value in df['Gender'].unique():
df[str(value)] = df['Gender'].map( lambda x: isValue(str(x) , str(value)))


Which works perfectly. But is there a better way to do this? Is there an inbuilt function in any of sklearn package that I can use?

Answer

Yes, there is a better way to do this. It's called pd.get_dummies

pd.get_dummies(df)

enter image description here

To replicate what you have:

order = ['Gender', 'Male', 'Female', 'Unknown']
pd.concat([df, pd.get_dummies(df, '', '').astype(int)], axis=1)[order]

enter image description here