JonathanBechtel JonathanBechtel - 3 months ago 9
Python Question

Iterating Over Every Item in a Series in Pandas With A Custom Function

I have a dataframe in Pandas that lists its information like this:

Player Year Height
1 Stephen Curry 2015-16 6-3
2 Mirza Teletovic 2015-16 6-10
3 C.J. Miles 2015-16 6-7
4 Robert Covington 2015-16 6-9


Right now data['Height'] stores its values as strings and I'd like to convert these values into inches stores as integers for further calculation.

I've tried a few approaches, including what's listed in the Pandas documentation, but to no avail.

First Attempt

def true_height(string):
new_str = string.split('-')
inches1 = new_str[0]
inches2 = new_str[1]

inches1 = int(inches1)*12
inches2 = int(inches2)

return inches1 + inches2


If you run

true_height(data.iloc[0, 2])


It returns 75, the correct answer.

To run it on the entire series I changed this line of code:

new_str = string.**str**.split('-')


And then ran:

data['Height'].apply(true_height(data['Height']))


And got the following error message:

int() argument must be a string or a number, not 'list'


I then tried using a for loop, thinking that might solve the trick, and so I modified the original formula to this:

def true_height(strings):
for string in strings:
new_str = string.split('-')
inches1 = new_str[0]
inches2 = new_str[1]

inches1 = int(inches1)*12
inches2 = int(inches2)

return inches1 + inches2


And now I get the following error:

'int' object is not callable


When I run:

data['Height'].apply(true_height(data['Height']))


I'm a little stumped. Any help would be appreciated. Thank you.

Answer

You can use apply on the Height column after it gets splitted into lists and pass a lambda function to it for conversion:

df['Height'] = df.Height.str.split("-").apply(lambda x: int(x[0]) * 12 + int(x[1]))

df
#             Player       Year    Height
# 1    Stephen Curry    2015-16        75
# 2  Mirza Teletovic    2015-16        82
# 3       C.J. Miles    2015-16        79
# 4 Robert Covington    2015-16        81

Or use your originally defined true_height function (1st attempt) with apply:

df['Height'] = df.Height.apply(true_height)

You just don't need to pass the df.Height to function since apply receives a function as a parameter.

Comments