Rakesh Adk7 Rakesh Adk7 - 2 months ago 26
Python Question

How to binarize the values in a pandas DataFrame?

I have the following DataFrame:

df = pd.DataFrame(['Male','Female', 'Female', 'Unknown', 'Male'], columns = ['Gender'])

I want to convert this to a DataFrame with columns 'Male','Female' and 'Unknown' the values 0 and 1 indicated the Gender.

Gender Male Female
Male 1 0
Female 0 1

To do this, I wrote a function and called the function using map.

def isValue(x , value):
if(x == value):
return 1
return 0

for value in df['Gender'].unique():
df[str(value)] = df['Gender'].map( lambda x: isValue(str(x) , str(value)))

Which works perfectly. But is there a better way to do this? Is there an inbuilt function in any of sklearn package that I can use?


Yes, there is a better way to do this. It's called pd.get_dummies


enter image description here

To replicate what you have:

order = ['Gender', 'Male', 'Female', 'Unknown']
pd.concat([df, pd.get_dummies(df, '', '').astype(int)], axis=1)[order]

enter image description here