SethMMorton SethMMorton - 1 month ago 7
Python Question

Changes in re module between Python 2 and Python 3

I am running my unit test suit with Python 3 on code that was developed under Python 2. All unit tests passed under Python 2 but not for Python 3. It seems there is some change in the implementation of

re
, and it is a real head scratcher for me. Below is a minimal working example to replicate my problem:

Python 2.7.6 (default, Dec 10 2013, 20:01:46)
>>> import re
>>> a = re.compile('test', re.IGNORECASE)
>>> assert a.flags == re.IGNORECASE
>>> # No output, i.e. assertion passed
>>> a.flags
2
>>> re.IGNORECASE
2





Python 3.3.3 (default, Dec 10 2013, 20:13:18)
>>> import re
>>> a = re.compile('test', re.IGNORECASE)
>>> assert a.flags == re.IGNORECASE
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AssertionError
>>> a.flags
34
>>> re.IGNORECASE
2





Clearly something is going on that I don't expect! I am assuming that there is some set of default flags that are OR'd together to make
flags
be 34 in python3. What I want to know is what these are so that I can make my assertion pass by comparing against the proper flags. As a bonus, what is the purpose for this?

Answer

Following are the RegEx flags, in Python 3.x.

import re
print (re.IGNORECASE)
print (re.LOCALE)
print (re.MULTILINE)
print (re.DOTALL)
print (re.UNICODE)
print (re.VERBOSE)
print (re.DEBUG)
print (re.A)

Output

2
4
8
16
32
64
128
256

From the docs,

Strings are immutable sequences of Unicode code points.

So, re.UNICODE flag is enabled by default. Since you have enabled re.IGNORECASE, that is ORed with re.UNICODE and that gives you 34.