Cees Timmerman Cees Timmerman - 2 months ago 6
Python Question

re.DOTALL works for re.match but not re.sub?

Why does this match as expected, but fails to sub? A single line s works fine.

import re
s = """<script>
wut</script>"""
print(re.match('<script(.*?)</script>', s, re.DOTALL).groups())
# Returns ('>\nwut',)
print(re.sub('<script(.*?)</script>', '', s, re.DOTALL))
# Returns <script>
# wut</script>


I just want to understand this; no need to suggest Beautiful Soup or manual parsing.

Answer

4h parameter to re.sub is count not flags, you can use:

>>> print re.sub('<script.*?</script>', '', s, 0, re.DOTALL)
''

Here we're passing count=0, which means any # of replacements.

Signature of re.sub is:

re.sub(pattern, repl, string, count=0, flags=0)