Mike Cuddy Mike Cuddy - 2 months ago 13
Python Question

converting an object to float in pandas along with replacing a $ sign

I am fairly new to Pandas and I am working on project where I have a column that looks like the following:

AverageTotalPayments
$7064.38
$7455.75
$6921.90
ETC


I am trying to get the cost factor out of it where the cost could be anything above 7000. First, this column is an object. Thus, I know that I probably cannot do a comparison with it to a number. My code, that I have looks like the following:

import pandas as pd
health_data = pd.read_csv("inpatientCharges.csv")

state = input("What is your state: ")
issue = input("What is your issue: ")
#This line of code will create a new dataframe based on the two letter state code
state_data = health_data[(health_data.ProviderState == state)]
#With the new data set I search it for the injury the person has.
issue_data=state_data[state_data.DRGDefinition.str.contains(issue.upper())]
#I then make it replace the $ sign with a '' so I have a number. I also believe at this point my code may be starting to break down.
issue_data = issue_data['AverageTotalPayments'].str.replace('$', '')
#Since the previous line took out the $ I convert it from an object to a float
issue_data = issue_data[['AverageTotalPayments']].astype(float)
#I attempt to print out the values.
cost = issue_data[(issue_data.AverageTotalPayments >= 10000)]
print(cost)


When I run this code I simply get nan back. Not exactly what I want. Any help with what is wrong would be great! Thank you in advance.

Answer

Try this:

In [83]: df
Out[83]:
  AverageTotalPayments
0             $7064.38
1             $7455.75
2             $6921.90
3                  aaa

In [84]: df.AverageTotalPayments.str.extract(r'.*?(\d+\.*\d*)', expand=False).astype(float) > 7000
Out[84]:
0     True
1     True
2    False
3    False
Name: AverageTotalPayments, dtype: bool

In [85]: df[df.AverageTotalPayments.str.extract(r'.*?(\d+\.*\d*)', expand=False).astype(float) > 7000]
Out[85]:
  AverageTotalPayments
0             $7064.38
1             $7455.75