Sanchit Sanchit - 1 month ago 10
Python Question

re.sub() - Regex for replacing last occurance of a substring in a string

I'm trying to replace the last occurrence of a substring from a string using re.sub in Python but stuck with the regex pattern. Can someone help me to get the correct pattern?

String = "cr US TRUMP DE NIRO 20161008cr_x080b.wmv"

or String = "crcrUS TRUMP DE NIRO 20161008cr.xml"

I want to replace the last occurrence of "cr" and anything before the extension.

desired output strings are -

"cr US TRUMP DE NIRO 20161008.wmv"
"crcrUS TRUMP DE NIRO 20161008.xml"

I'm using re.sub to replace it.

re.sub('pattern', '', String)

Please advise.

Answer

You can use this negative lookahead regex:

repl = re.sub(r"cr((?!cr)[^.])*(?=\.[^.]+$)", "", input);

RegEx Demo

RegEx Breakup:

cr         # match cr
(?:        # non-capturing group start
   (?!     # negative lookahead start
      cr   # match cr
   )       # negative lookahead end
   [^.]    # match anything but DOT
)          # non-capturing group end
*          # match 0 or more of matching character that doesn't have cr at next postion
(?=        # positive lookahead start
   \.      # match DOT
   [^.]+   # followed by 1 or more anything but DOT
   $       # end of input
)          # postive lookahead end