Daulton Sink Daulton Sink - 3 years ago 187
Python Question

How to get column and row from partial string in pandas dataframe efficenitly

How to get column row and value from partial string efficiently with Pandas

I have a pandas dataframe setup with about 150 indexes and 8 columns what I am looking to do is efficiently get the the
column and index for cells based on a partial string. What I came up with was as follows:

df = pd.DataFrame([["foo", "foo", "foo", "foo"], ["foo", "bar", "foo", "foo"], ["bar", "foo", "foo", "bar"],
["foo", "foo", "foo", "bar"]])


0 1 2 3
0 foo foo foo foo
1 foo bar foo foo
2 bar foo foo bar
3 foo foo foo bar

Here if I'm looking for just the entries that contain the sub-string "ar" I employ:

setup_mask = df.applymap(lambda x: "ar" in str(x))
values_hold = []
for x in df.index:
for y in df.columns:
if setup_mask.loc[x, y].any() == bool(True):
if [x, y] not in values_hold:
values_hold.append([x, y])

This works well and returns a list of index column values
[[1, 1], [2, 0], [2, 3], [3, 3]].

This feels unpythonic and really just plain messy is there a way to do something like this in a more pythonic way?

P.S. I know I could cut out the mask but I felt like if there is a more pythonic way it would rely on a mask.

Answer Source

Pandas supports vectorized string operations, but only on one column at a time. So:

df.apply(lambda ser: ser.str.contains('ar'))

Will give you this:

       0      1      2      3
0  False  False  False  False
1  False   True  False  False
2   True  False  False   True
3  False  False  False   True

And it's pretty efficient so long as you have fewer columns than rows (which you do).

If you store the above in mask, then:


Gives you your answer:

array([[1, 1],
       [2, 0],
       [2, 3],
       [3, 3]])
Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download