Ион Сынкетру Ион Сынкетру - 1 month ago 6
Python Question

String replacement with pandas

I have a pandas column with some strings values like:

White bear
Brown Bear
Brown Bear 100 Kg
White bear 200 cm


How to check all the strings if they contain the sequence 'White bear' and replace the entire value (not only the sequence) with a string like 'White_bear'?

df['Species'] = df['Species'].str.replace('White bear', 'White_bear')


did not work right for me because it replaces only the sequence.

Answer

you can use boolean indexing:

In [173]: df.loc[df.Species.str.contains(r'\bWhite\s+bear\b'), 'Species'] = 'White_bear'

In [174]: df
Out[174]:
             Species
0         White_bear
1         Brown Bear
2  Brown Bear 100 Kg
3         White_bear

or bit more general solution:

In [204]: df
Out[204]:
             Species
0         White bear
1         Brown Bear
2  Brown Bear 100 Kg
3  White bear 200 cm

In [205]: from_re = [r'.*?\bwhite\b\s+\bbear\b.*',r'.*?\bbrown\b\s+\bbear\b.*']

In [206]: to_re = ['White_bear','Brown_bear']

In [207]: df.Species = df.Species.str.lower().replace(from_re, to_re, regex=True)

In [208]: df
Out[208]:
      Species
0  White_bear
1  Brown_bear
2  Brown_bear
3  White_bear