piRSquared piRSquared - 2 months ago 7
Python Question

how to use the `query` method to check if elements of a column contain a specific string

I know how to check if a column contains a string. My preffered method is to use

.str.contains
. However, that returns a boolean array that I have to use as a mask on the original dataframe. The convenience of
query
is that it returns the already filtered dataframe.

consider the
df


df = pd.DataFrame(np.array(list('abcdefghijklmno')).reshape(5, 3),
columns=list('XYZ')).add('w')
df


enter image description here

Using
str.contains


df[df.Y.str.contains('b')]


enter image description here

But I have a preference to use
query


df.query('Y == "bw"')


enter image description here

The problem is, I don't know how to use
query
to check for substrings. I wanted something similar to this.

df.query('Y like "b%"')

Answer

This is currently not supported, query only implements a subset of operations, basically none of the string functions.

Just a sidenote to the comment, query does support a vectorized version of the in keyword.

df.query('X in ["aw", "dw"]')
Out[9]: 
    X   Y   Z
0  aw  bw  cw
1  dw  ew  fw
Comments