user2539738 user2539738 - 1 month ago 19
Python Question

python, pandas, work through bad data

so I've got a very large dataframe of mostly floats (read from a csv) but every now and then, I get a string, or nan

date load
0 2016-07-12 19:04:31.604999 0
...
10 2016-07-12 19:04:31.634999 nan
...
50 2016-07-12 19:04:31.664999 ".942.197"
...


I can deal with nans (interpolate), but can't figure out how to use replace in order to catch strings, and not numbers

df.replace(to_replace='^[a-zA-Z0-9_.-]*$',regex=True,value = float('nan'))


returns all nans. I wan't nans for only when it's actually a string

Answer

I think you want pandas.to_numeric. It work with series-like data.

>>> pandas.to_numeric(df['load'], errors='coerce')
0    0.0
1    NaN
2    NaN
dtype: float64