Python Question

Slicing pandas dataframe by looking for character "in" string

I want to extract a set of rows from a dataframe based on whether a string in that row contains a given substring.

For example, say I have

testdf = pd.DataFrame({'A':['abc','efc','abz'], 'B':[4,5,6]})

I want to get the rows containing the substring
in column

I tried
testdf.loc[lambda df: 'ab' in df['A'], :]
, but got the following error:

KeyError Traceback (most recent call last)
KeyError: False

What confuses me is that
testdf.loc[lambda df: df['A'] == 'abc', :]
does not five an error: it returns the one row containing the value
. So it appears that there something about the
'ab' in df['A']
boolean that is not correct...

I am using python 2.7 and pandas 0.18.1 in a Jupyter (4.0.6) notebook.


use str.contains:

In [67]:

     A  B
0  abc  4
2  abz  6

What you tried doesn't make sense firstly:

In [70]:
'ab' in testdf['A']


but what you're really trying to do is test 'ab' in each element of that column:

In [71]:
testdf['A'].apply(lambda x: 'ab' in x)

0     True
1    False
2     True
Name: A, dtype: bool

However, there is no need for apply here when there is a vectorised method

What you tried here:

testdf.loc[lambda df: 'ab' in testdf['A']]

raised a keyerror because the lambda returned a scalar False which can't be used to index the whole df, but testdf.loc[lambda df: df['A'] == 'abc', :] works because df['A'] == 'abc' returns a boolean mask which can be used to mask the entire df

Also the lambda is unnecessary in the loc:

testdf.loc[testdf['A'] == 'abc', :]

would've worked, if you think about it, all you did was provide a lambda for your df which is no different to the above