Marwan Alsabbagh Marwan Alsabbagh - 5 months ago 8
Python Question

Regular expression multi-line replacement in Python

I would like to search and replace a block of text which contains new line characters.

In the example below when the DOTALL flag is specified, findall behaves as expected and

'.'
matches any character including a newline.
But when calling sub, the DOTALL flag doesn't seem to do anything and no matches are found. I just want to confirm that I can't use '.' with sub to replace text that contains new line characters or if I'm not calling the function correctly.

Code



import re
text = """
some example text...
START
bla bla
bla bla
END
"""
print 'this works:', re.findall('START.*END', text, re.DOTALL)
print 'this fails:', re.sub('START.*END', 'NEWTEXT', text, re.DOTALL)


Output



this works: ['START\nbla bla\nbla bla\nEND']
this fails:
some example text...
START
bla bla
bla bla
END

Answer

I'm not exactly sure why, but you have to specify flags= in re.sub (the docs uses it).

print 'this works:', re.sub('START.*END', 'NEWTEXT', text, flags=re.DOTALL)

It might be because of the optional count argument.

EDIT:

I think that's because of the count argument after all, since this works as well:

print 'this works:', re.sub('START.*END', 'NEWTEXT', text, 0, re.DOTALL)

0 meaning replacing all.