JCode JCode - 1 year ago 90
Python Question

Python: "unexpected end of regular expression" during re.compile, empty brackets

To summarize i have

statement like so:

markers = ['x'] # some list
re.compile(r" *[{}].*(?=\n|$)".format('\\'.join([''] + markers)))

For most cases it works fine unless
is empty
and RegEx pattern looks like so:

pattern = ' *[].*(?=\\n|$)'

Why does it have problem with empty character set? What is the workaround to make it work for empty


Credits for: Martijn Pieters, Wiktor Stribiżew and Amadan.

To summarize:

  • empty character set doesn't exist in RegEx.
    is parsed like
    so interpreter expects closing
    and that causes error,

  • checking for empty
    must be done before compiling this pattern, to avoid invalid empty brackets

  • .*(?=\n|$)
    has redundant
    and can be simplified to

  • to escape special characters efficiently inside brackets
    it it's better to use

Adding thigs up the solution for my problem is:

if markers:
re.compile(r" *[{}].*".format(re.escape(''.join(markers))))
# something

Answer Source

You may check if the markers list is not empty at the very beginning, then, only escape the characters that must be escaped in the character class: ^, \, ], [, -.

Note that if the markers list is empty, the pattern becomes *.*, basically accepting any line. You can match it with "^.*$".

Here is my suggestion:

import re
markers = ['x', ']', '[', '-', '^', '\\'] # some list
global p
#markers = [] # some list
if markers:
    escaped = [re.sub(r"[][^\\-]", r"\\\g<0>", x) for x in markers]
    pat = r" *[{}].*".format("".join(escaped))
    p = re.compile(pat)
    p = re.compile("^.*$")


See the Python demo

Also, the .*(?=\n|$) can be actually reduced to .* since . matches any character but a newline (it also can match a CR symbol) and .* will always match all chars up to the \n or end of string.