john doe john doe - 2 months ago 11
Python Question

How to compute the indexed intersection between two similar pandas columns?

I have the following list:

new_pets = ['Bobcats', 'dog', 'cat', 'turtle', 'monkey', 'goat', 'ferret', 'pig', 'Chipmunks', 'Capybaras', 'Ducks']

And the following pandas dataframe:

In: df

0 Cats
1 Lizard
2 Dog
3 Baby Ferrets
4 Pig
5 Armadillo

How can I get into a new column the elements of
which appear in

In: df['new_col']

0 True
1 False
2 True
3 True
4 True
5 False

From the docs, I noticed that this can be done with contains(), so I tried the following:

result = df[df['pets'].str.contains(x, case = False) for x in new_pets]

However, I am not sure if this is possible. For instance, is it possible to match
Baby Ferrets
, since
is similar to
?. For that constraint, I tried to use
, but I am not getting the expected outcome (*). Any idea of how to retrive such strings in a new dataframe?.


You can first join values by | (regex or is |) and convert all values to lower-case by lower - output is in joined. Then lowercase all values in column by str.lower and call str.contains with joined for checking if bobcats or dog or dog ... is in column:

print (df)
0          Cats
1        Lizard
2           Dog
3  Baby Ferrets
4           Pig
5     Armadillo

joined = '|'.join(new_pets).lower()

df['new_col'] = df['pets'].str.lower().str.contains(joined)
print (df)
              a new_col
0          Cats    True
1        Lizard   False
2           Dog    True
3  Baby Ferrets    True
4           Pig    True
5     Armadillo   False