Montenegrodr Montenegrodr - 5 months ago 10
Python Question

How remove escaped sequences (e.g. \"escaped string\") using regex?

chaps. I am trying to remove quoted sequences from a string. For the example below my script works fairly:

import re
doc = ' Doc = "This is a quoted string: this is cool!" '
cleanr = re.compile('\".*?\"')
doc = re.sub(cleanr, '', doc)
print doc


Result (as expected):

' Doc = '


However when I have escaped string inside the quoted sentence I am not able to remove the escaped sequence using the pattern that I think would be the right one:

import re
doc = ' Doc = "This is a quoted string: \"this is cool!\" " '
cleanr = re.compile('\\".*?\\"') # new pattern
doc = re.sub(cleanr, '', doc)
print doc


Result

'Doc = this is cool!'


Expected:

'Doc = "This is a quoted string: " '


Does anyone knows what is happening? If the pattern
'\\".*?\\"'
is wrong what would be the right one?

Answer

doc doesn't contain any escaped characters, so your regex doesn't match.

Add the r prefix to the string, which means that it should be treated as a raw string, ignoring escaped codes.

Try this:

>>> doc = r' Doc = "This is a quoted string: \"this is cool!\" " '
>>> cleanr = re.compile(r'\\".*?\\"')
>>> re.sub(cleanr, '', doc)
' Doc = "This is a quoted string:  " '