Nikolay Derkach Nikolay Derkach - 2 months ago 8
JSON Question

Python: read regexps from JSON

I have a JSON file where I store a mapping, which contains regexes, like the ones below:

"F(\\d)": "field-\\\\1",
"FLR[ ]*(\\w)": "floor-\\\\1",


To comply with the standard I escape the backslashes, the actually regexps should contain
\d
,
\w
, and
\\1
.

Once I read this JSON with json.load() I still need to post-process the resulting dictionary to get correct regexps. I need to substitute a
\\
with
\
. What's the best way to this?

So far I tried both
re.sub()
and
str.replace()
and in both cases it's not clear how to represent a single backslash in substation.

For example, I don't understand why the following doesn't produce a single backslash:

In [76]: "\\\\d".replace("\\\\", "\\")
Out[76]: '\\d'

Answer

It does produce a single backslash - that backslash is escaped when displayed. This is done so that characters without a non-escaped way to display them can still be unambiguously printed - otherwise, you wouldn't know whether a backslash was meant to be escaping the following character or not.

This can be demonstrated by checking the individual characters:

# In a terminal/REPL:
>>>> "\\\\d".replace("\\\\", "\\")[0]
'\\'
>>>> "\\\\d".replace("\\\\", "\\")[1]
'd'
>>>> "\\\\d".replace("\\\\", "\\")[2]
'd'
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
IndexError: string index out of range

One tip for doing regexes in python: Use raw strings. If you put an r before the first quote of a string literal, backslashes won't escape anything (except for an ending quote). r"\n" is a string containing two characters, a \ and an n, equivalent to "\\n". When working with regexes and other things where you need to send escape sequences, they're very helpful. See also: What exactly do “u” and “r” string flags do in Python, and what are raw string literals?