john doe john doe - 1 month ago 7
Python Question

How to compute the indexed intersection between two similar pandas columns?

I have the following list:

new_pets = ['Bobcats', 'dog', 'cat', 'turtle', 'monkey', 'goat', 'ferret', 'pig', 'Chipmunks', 'Capybaras', 'Ducks']


And the following pandas dataframe:

In: df

0 Cats
1 Lizard
2 Dog
3 Baby Ferrets
4 Pig
5 Armadillo


How can I get into a new column the elements of
new_pets
which appear in
df
(*)?:

In: df['new_col']

0 True
1 False
2 True
3 True
4 True
5 False


From the docs, I noticed that this can be done with contains(), so I tried the following:

result = df[df['pets'].str.contains(x, case = False) for x in new_pets]


However, I am not sure if this is possible. For instance, is it possible to match
Baby Ferrets
with
ferret
, since
Ferrets
is similar to
ferret
?. For that constraint, I tried to use
case=False
, but I am not getting the expected outcome (*). Any idea of how to retrive such strings in a new dataframe?.

Answer

You can first join values by | (regex or is |) and convert all values to lower-case by lower - output is in joined. Then lowercase all values in column by str.lower and call str.contains with joined for checking if bobcats or dog or dog ... is in column:

print (df)
           pets
0          Cats
1        Lizard
2           Dog
3  Baby Ferrets
4           Pig
5     Armadillo

joined = '|'.join(new_pets).lower()

df['new_col'] = df['pets'].str.lower().str.contains(joined)
print (df)
              a new_col
0          Cats    True
1        Lizard   False
2           Dog    True
3  Baby Ferrets    True
4           Pig    True
5     Armadillo   False