Homunculus Reticulli Homunculus Reticulli - 1 month ago 17
Python Question

Python 2 string expression not recognised using Python 3

I have a python script containing this regex:

expression1 = ur'(.*?),\s(.*)\s(sold(?: post-exercise)?|bought|purchased|awarded|exercised|transferred in|transferred out|re-invested)\s*([\d,]*).*price of\s*(\d*.\d+?p)'


Python parser barfs and complains its invalid Syntax.

Why is this invalid syntax in Python 3, yet valid in Python 2.
Is there a way I can write it to work with both versions?

Answer

Python 2 ur strings had a weird incompatibility with Python 3, where \u and \U escapes would still be processed instead of being left "raw". When the u prefix was reintroduced to Python 3 in the 3.3 revision, an explicit decision was made to exclude the ur combination, rather than have inconsistent behavior.

If you want a raw Unicode string that works in both Python 2 and 3, you'll need a workaround. Possibilities include using a br raw bytestring and converting it to Unicode with an appropriate codec, or using from __future__ import unicode_literals and using the plain r prefix. Be careful about \u and \U escapes.