R-Yue R-Yue - 3 years ago 189
Python Question

Replace column values with 0 or 1 based on NaNs

Here is a snapshot of the CSV data,
file.

I want to replace the null, or 'nan', values with a 0 and replace all other entries with a 1 in the column 'Death Year':

import pandas as pd
import numpy as np
mydata_csv = pd.read_csv('D:\Python\character-deaths.csv',sep = ',',encoding = 'utf-8')
mydata_csv
del mydata_csv['Book of Death']
del mydata_csv['Death Chapter']

if mydata_csv['Death Year'] == np.nan:
mydata_csv['Death Year'] = 0
else:
mydata_csv['Death Year'] = 1


The above code produces the following error:

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

Answer Source

You have two problems:

  1. A logical operation on a series/dataframe does not yield a scalar result. It yields a vector, that if cannot understand.

  2. NaN != NaN; your if condition will never hold true even if the columns are NaN.

    In [9]: np.nan == np.nan
    Out[9]: False
    

Just use np.where.

mydata_csv['Death Year'] = np.where(mydata_csv['Death Year'].isnull(), 0, 1)

Another improvement I'd recommend is using df.drop when deleting columns. Instead of del, try the more pandaic version:

mydata_csv = mydata_csv.drop(['Book of Death', 'Death Chapter'], 1)
Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download