historystamp historystamp - 3 months ago 17
Python Question

Regular expression quoting in Python

How should I declare a regular expression?

mergedData = re.sub(r'\$(.*?)\$', readFile, allData)


I'm kind of wondering why this worked. I thought that I need to use the
r''
to pass a regular expression.

mergedData = re.sub("\$(.*?)\$", readFile, allData)


What does
"\$"
result in in this case? Why? I would have thought
"$"
.

Answer

I thought that I need to user the r'' to pass a regular expression.

r before a string literal indicates raw string, which means the usual escape sequences such as \n or \r are no longer treated as new line character or carriage return, but simply \ followed by n or r. To specify a \, you only need \ in raw string literal, while you need to double it up \\ in normal string literal. This is why it is usually the case that raw string is used in specifying regular expression1. It reduces the confusion when reading the code. You would have to do escaping twice if you use normal string literal: once for the normal string literal escape and the second time for the escaping in regex.

What does "\$" result in this case? Why? I would have thought "$"

In Python normal string literal, if \ is not followed by an escape sequence, the \ is preserved. Therefore "\$" results in \ followed by $.

This behavior is slightly different from the way C/C++ or JavaScript handle similar situation: the \ is considered escape for the next character, and only the next character remains. So "\$" in those languages will be interpreted as $.

Footnote

1: There is a small defect with the design of raw string in Python, though: Why can't Python's raw string literals end with a single backslash?

Comments