juggernaut juggernaut - 4 months ago 31
Python Question

How to find and replace nth occurence of word in a sentence using python regular expression?

Using python regular expression only, how to find and replace nth occurence of word in a sentence?
For example:

str = 'cat goose mouse horse pig cat cow'
new_str = re.sub(r'cat', r'Bull', str)
new_str = re.sub(r'cat', r'Bull', str, 1)
new_str = re.sub(r'cat', r'Bull', str, 2)


I have a sentence above where the word 'cat' appears two times in the sentence. I want 2nd occurence of the 'cat' to be changed to 'Bull' leaving 1st 'cat' word untouched. My final sentence would look like:
"cat goose mouse horse pig Bull cow". In my code above I tried 3 different times could not get what I wanted.

Answer

Use negative lookahead like below.

>>> s = "cat goose  mouse horse pig cat cow"
>>> re.sub(r'^((?:(?!cat).)*cat(?:(?!cat).)*)cat', r'\1Bull', s)
'cat goose  mouse horse pig Bull cow'

DEMO

  • ^ Asserts that we are at the start.
  • (?:(?!cat).)* Matches any character but not of cat , zero or more times.
  • cat matches the first cat substring.
  • (?:(?!cat).)* Matches any character but not of cat , zero or more times.
  • Now, enclose all the patterns inside a capturing group like ((?:(?!cat).)*cat(?:(?!cat).)*), so that we could refer those captured chars on later.
  • cat now the following second cat string is matched.

OR

>>> s = "cat goose  mouse horse pig cat cow"
>>> re.sub(r'^((.*?cat.*?){1})cat', r'\1Bull', s)
'cat goose  mouse horse pig Bull cow'

Change the number inside the {} to replace the first or second or nth occurance of the string cat

To replace the third occurance of the string cat, put 2 inside the curly braces ..

>>> re.sub(r'^((.*?cat.*?){2})cat', r'\1Bull', "cat goose  mouse horse pig cat foo cat cow")
'cat goose  mouse horse pig cat foo Bull cow'

Play with the above regex on here ...

Comments