Stacey Stacey - 1 year ago 60
Python Question

sum dataframe column that contains different data types

I have a data-frame (df) with 2 columns (id and rate) which looks like:

id rate
0 #NAME?
1 #NAME?
2 #NAME?
3 #NAME?
4 #NAME?
5 #NAME?
6 #NAME?
7 #NAME?
8 #NAME?
9 0.5
10 #NAME?
: :
211 0.25
212 0.00
213 #NAME?
214 1.00
215 #NAME?


As you can see the
rate
column has more than one type and I am trying to sum the non #NAME? entries in the rate column. I have tried:

df = pd.read_csv(full_path, header=0, usecols=[0,8], dayfirst=True,index_col=[0], names=['id', 'rate'])
print(df)
sumRate = sumRate + df['rate'].sum()


but I get returned the following exception:

TypeError: unsupported operand type(s) for +: 'int' and 'str'


I am unsure how to sum the floating values only and unfortunately the format of the data I'm pulling into the data-frame is out of my control. If any one can help it would be much appreciated.

Thanks

Answer Source

I think you need to_numeric with error='coerce' parameter for convert not numeric to NaN first and then sum:

print (pd.to_numeric(df['rate'], errors='coerce'))
0      NaN
1      NaN
2      NaN
3      NaN
4      NaN
5      NaN
6      NaN
7      NaN
8      NaN
9     0.50
10     NaN
11     NaN
12    0.25
13    0.00
14     NaN
15    1.00
16     NaN
Name: rate, dtype: float64

sumRate = pd.to_numeric(df['rate'], errors='coerce').sum()
print (sumRate)
1.75
Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download