spiff spiff - 2 months ago 6
Python Question

Pandas remove rows which any string

A very basic qs guys - thans vm for taking a look. I want to remove rows in Col1 which contain any string - care about only numeric values in Col1.

Input:

Col1 Col2 Col3
0 123 48.0 ABC
1 45 85.0 DEF
2 A.789 66.0 PQR
3 RN.35 9.0 PQR
4 LMO 12.0 ABC


Output:

Col1 Col2 Col3
0 123.0 48.0 ABC
1 45.0 85.0 DEF


I tried

test = input_[input_['Col1'].str.contains(r'ABCDEGGHIJKLMNOPQRSTUVWXYZ.')]


But see this error

ValueError: cannot index with vector containing NA / NaN values


Could you:
- Give a short explanation as to why that's not working?
- What would be the alternate solution pls?

Thanks a lot again!

Answer

do like this:

import re
regex = re.compile("[a-zA-Z]+")
df.ix[df.col1.map(lambda x: regex.search(x) is None)]
Comments