Cees Timmerman Cees Timmerman - 6 months ago 29
Python Question

re.DOTALL works for re.match but not re.sub?

Why does this match as expected, but fails to sub? A single line s works fine.

import re
s = """<script>
print(re.match('<script(.*?)</script>', s, re.DOTALL).groups())
# Returns ('>\nwut',)
print(re.sub('<script(.*?)</script>', '', s, re.DOTALL))
# Returns <script>
# wut</script>

I just want to understand this; no need to suggest Beautiful Soup or manual parsing.


4h parameter to re.sub is count not flags, you can use:

>>> print re.sub('<script.*?</script>', '', s, 0, re.DOTALL)

Here we're passing count=0, which means any # of replacements.

Signature of re.sub is:

re.sub(pattern, repl, string, count=0, flags=0)