Kevin Kevin - 7 months ago 549
Python Question

Convert a column in pandas dataframe from String to Float

I've already read about various solutions, and tried the solution stated here: Pandas: Converting to numeric, creating NaNs when necessary

But it didn't really solve my problem:
I have a dataframe contains multiple columns, in where a column

['PricePerSeat_Outdoor']
contains some float values, some empty values, and some
'-'


print type(df_raw['PricePerSeat_Outdoor'][99])
print df_raw['PricePerSeat_Outdoor'][95:101]
df_raw['PricePerSeat_Outdoor'] = df_raw['PricePerSeat_Outdoor'].apply(pd.to_numeric, errors='coerce')
print type(df_raw['PricePerSeat_Outdoor'][99])


Then I got:

<type 'str'>
95 17.21
96 17.24
97 -
98 -
99 17.2
100 17.24
Name: PricePerSeat_Outdoor, dtype: object
<type 'str'>


Values at row #98 and 99 didn't get converted. Again, I've already tried multiple methods including following but it just didn't work. Much appreciated if someone can give me some hints.

df_raw['PricePerSeat_Outdoor'] = df_raw['PricePerSeat_Outdoor'].apply(pd.to_numeric, errors='coerce')


Also, how can I convert multiple columns to numeric at once? Thanks.

Answer

try this:

df_raw['PricePerSeat_Outdoor'] = pd.to_numeric(df_raw['PricePerSeat_Outdoor'], errors='coerce')

Here is an example:

In [97]: a = pd.Series(['17.21','17.34','15.23','-','-','','12.34']

In [98]: b = pd.Series(['0.21','0.34','0.23','-','','-','0.34'])

In [99]: df = pd.DataFrame({'a':a, 'b':b})

In [100]: df['c'] = np.random.choice(['a','b','b'], len(df))

In [101]: df
Out[101]:
       a     b  c
0  17.21  0.21  a
1  17.34  0.34  b
2  15.23  0.23  b
3      -     -  b
4      -        b
5            -  b
6  12.34  0.34  b

In [102]: cols_to_convert = ['a','b']

In [103]: cols_to_convert
Out[103]: ['a', 'b']

In [104]: for col in cols_to_convert:
   .....:         df[col] = pd.to_numeric(df[col], errors='coerce')
   .....:

In [105]: df
Out[105]:
       a     b  c
0  17.21  0.21  a
1  17.34  0.34  b
2  15.23  0.23  b
3    NaN   NaN  b
4    NaN   NaN  b
5    NaN   NaN  b
6  12.34  0.34  b

check:

In [106]: df.dtypes
Out[106]:
a    float64
b    float64
c     object
dtype: object
Comments