Kurt Kurt - 3 years ago 222
Python Question

python pandas if statement based on len

This code gives me the following dataframe:

import pandas as pd

pace=['06:40','10:05','7:25','10:30']
distance=['10','20','30','40']
dd=list(zip(pace,distance))
df=pd.DataFrame(dd,columns=['pace','distance'])

pace distance
0 06:40 10
1 10:05 20
2 7:25 30
3 10:30 40


If I try to parse the pace data that is less the 11:00 with the following code I get:

input='11:00'
length_input=len(input)
df=df[df['pace']<input]

pace distance
0 06:40 10
1 10:05 20
3 10:30 40


I would like to be able to add a zero to the beginning of all pace data that has a len==4 so that any pace value like 7:25 is included. I've tried the following code:

if df['pace'].astype(str).map(len)==4:
df['pace']='0'+df['pace'].astype(str)


This code results in the error - ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

After searching this error I've found documentation for or(|) and and(&). Any help would be greatly appreciated.

Answer Source

How about

df['pace'] = df['pace'].apply(lambda x: x if len(x) > 4 else '0' + x)

The apply() method applies a function to each row of the pace column. In this case, I used a lambda function that leaves the row unchanged if the len of the row is > 4, otherwise it adds a '0' to its beginning.

However, it would probably be cleaner to convert the pace column to datetime.timelike so:

df['pace'] = pd.to_datetime(df['pace']).dt.time
Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download