mik.ferrucci mik.ferrucci - 29 days ago 12
Python Question

Pandas converting column of strings and NaN (floats) to integers, keeping the NaN

I have problems in converting a column which contains both numbers of 2 digits in string format (type: str) and NaN (type: float64). I want to obtain a new column made this way: NaN where there was NaN and an integer numbers where there was a number of 2 digits in string format.
As an example: I want to obtain column Yearbirth2 from column YearBirth1 like this:

YearBirth1 #numbers here are formatted as strings: type(YearBirth1[0])=str
34 # and NaN are floats: type(YearBirth1[2])=float64.
76
Nan
09
Nan
91

YearBirth2 #numbers here are formatted as integers: type(YearBirth2[0])=int
34 #NaN can remain floats as they were.
76
Nan
9
Nan
91


I have tried this:

csv['YearBirth2'] = (csv['YearBirth1']).astype(int)


And as I expected i got this error:

ValueError: cannot convert float NaN to integer


So I tried this:

csv['YearBirth2'] = (csv['YearBirth1']!=NaN).astype(int)


And got this error:

NameError: name 'NaN' is not defined


Finally I have tried this:

csv['YearBirth2'] = (csv['YearBirth1']!='NaN').astype(int)


NO error, but when I checked the column YearBirth2, this was the result:

YearBirth2:
1
1
1
1
1
1


Very bad.. I think the idea is right but there is a problem to make Python able to understand what I mean for NaN.. Or maybe the method I tried is wrong..

I also used pd.to_numeric() method, but this way i obtain floats, not integers..

Any help?!
Thanks to everyone!

P.S: csv is the name of my DataFrame;
Sorry if I am not so clear, I am on improving with English language!

Answer

You can use to_numeric, but is impossible get int with NaN values - they are always converted to float: see na type promotions.

df['YearBirth2'] = pd.to_numeric(df.YearBirth1, errors='coerce')
print (df)
  YearBirth1  YearBirth2
0         34        34.0
1         76        76.0
2        Nan         NaN
3         09         9.0
4        Nan         NaN
5         91        91.0
Comments