mik.ferrucci - 2 months ago 26

Python Question

I have problems in converting a column which contains both numbers of 2 digits in string format (type: str) and NaN (type: float64). I want to obtain a new column made this way: NaN where there was NaN and an integer numbers where there was a number of 2 digits in string format.

As an example: I want to obtain column Yearbirth2 from column YearBirth1 like this:

`YearBirth1 #numbers here are formatted as strings: type(YearBirth1[0])=str`

34 # and NaN are floats: type(YearBirth1[2])=float64.

76

Nan

09

Nan

91

YearBirth2 #numbers here are formatted as integers: type(YearBirth2[0])=int

34 #NaN can remain floats as they were.

76

Nan

9

Nan

91

I have tried this:

`csv['YearBirth2'] = (csv['YearBirth1']).astype(int)`

And as I expected i got this error:

`ValueError: cannot convert float NaN to integer`

So I tried this:

`csv['YearBirth2'] = (csv['YearBirth1']!=NaN).astype(int)`

And got this error:

`NameError: name 'NaN' is not defined`

Finally I have tried this:

`csv['YearBirth2'] = (csv['YearBirth1']!='NaN').astype(int)`

NO error, but when I checked the column YearBirth2, this was the result:

`YearBirth2:`

1

1

1

1

1

1

Very bad.. I think the idea is right but there is a problem to make Python able to understand what I mean for NaN.. Or maybe the method I tried is wrong..

I also used pd.to_numeric() method, but this way i obtain floats, not integers..

Any help?!

Thanks to everyone!

P.S: csv is the name of my DataFrame;

Sorry if I am not so clear, I am on improving with English language!

Answer

You can use `to_numeric`

, but is impossible get `int`

with `NaN`

values - they are always converted to `float`

: see na type promotions.

```
df['YearBirth2'] = pd.to_numeric(df.YearBirth1, errors='coerce')
print (df)
YearBirth1 YearBirth2
0 34 34.0
1 76 76.0
2 Nan NaN
3 09 9.0
4 Nan NaN
5 91 91.0
```