clwen clwen - 2 months ago 22
Python Question

Filtering rows by `df.str.split` on columns in pandas

I have a DataFrame looks like the following

url1, labela:0.5
url2, labelb:0.4
url3, labelc:0.7


I was trying to do the following, split the label column by ':' and filter by the numeric value, say greater than 0.6. In the above case, filter out rows with url1 and url2 since the values are 0.5 and 0.4, respectively.

I did the following but this won't work:

df = df[df["labels"].str.split(':').get(1).astype('float') >= 0.6]


I guess what happen is that
get(1)
will give me the second row instead of my imaginary second column after split. I also tried a bunch of the variation of this but didn't work. I hope this illustrate my idea though. What would be an elegant way to do this?

Thanks.

Answer Source

You can use df.str.split(..., expand=True) followed by type conversion to float with df.astype and boolean indexing:

In [782]: df[df['labels'].str.split(':', expand=True)[1].astype(float) >= 0.6]
Out[782]: 
    url       labels
2  url3   labelc:0.7