Shuman Shuman - 1 month ago 7
Python Question

Why does re.VERBOSE prevent my regex pattern from working?

I want to use the following regex to get modified files from svn log, it works fine as a single line, but since it's complex, I want to use

re.VERBOSE
so that I can add comment to it, then it stopped working. What am I missing here? Thanks!

revision='''r123456 | user | 2013-12-22 11:21:41 -0700 (Thu, 22 Dec 2013) | 1 line
Changed paths:
A /trunk/abc/python/test/module
A /trunk/abc/python/test/module/__init__.py
A /trunk/abc/python/test/module/usage.py
A /trunk/abc/python/test/module/logger.py

copied from test
'''

import re

# doesn't work
print re.search('''
(?<=Changed\spaths:\n)
((\s{3}[A|M|D]\s.*\n)*)
[(?=\n)|]
''', revision, re.VERBOSE).groups()

# works
print re.search('(?<=Changed\spaths:\n)((\s{3}[A|M|D]\s.*\n)*)[(?=\n)|]', revision).groups()[0]


The string I want to extract is:

A /trunk/abc/python/test/module
A /trunk/abc/python/test/module/__init__.py
A /trunk/abc/python/test/module/usage.py
A /trunk/abc/python/test/module/logger.py

Answer

Use a raw string literal:

re.search(r'''
            (?<=Changed\spaths:\n)  
            (?:\s{3}[AMD]\s.*\n)*
            (?=\n)    
            ''', revision, re.VERBOSE)

See this fixed Python demo.

The main issue is that you have to pass it as a raw string literal, or use \\n instead of \n. Otherwise, \n (being a literal newline) is ignored inside the regex pattern, is treated as formatting whitespace (read more about that in the Python re docs).

Also, note you corrupted the lookahead by enclosing it with [...] (it became a character class part) and the | inside character classes are treated as literal pipes (thus, here, they should be removed).

Comments