jskmr jskmr - 2 months ago 10
Python Question

python - regex for matching text between two characters while ignoring backslashed characters

I am trying to use python to get text between two dollar signs ($), but the dollar signs should not start with a backslash i.e. \$ (this is for a LaTeX rendering program). So if this is given

$\$x + \$y = 5$ and $3$


This is what should be outputted

['\$x + \$y = 5', ' and ', '3']


This is my code so far:

def parse_latex(text):
return re.findall(r'(^|[^\\])\$.*?[^\\]\$', text)
print(parse_latex(r'$\$x + \$y = 5$ and $3$'))


But this is what I get:

['', ' ']


I am not sure how to proceed from here.

Answer

You can use this lookaround based regex that excluded escaped characters:

>>> text = r'$\$x + \$y = 5$ and $3$'
>>> re.findall(r'(?<=\$)([^$\\]*(?:\\.[^$\\]*)*)(?=\$)', text)
['\\$x + \\$y = 5', ' and ', '3']

RegEx Demo

Code Demo

RegEx Breakup:

(?<=\$)           # Lookbehind to assert previous character is $
(                 # start capture group
   [^$\\]*        # match 0 or more characters that are not $ and \
   (?:            # start non-capturing group
      \\.         # match \ followed any escaped character
      [^$\\]*     # match 0 or more characters that are not $ and \
   )*             # non-capturing group, match 0 or more of this non-capturing group
)                 # end capture group
(?=\$)            # Lookahead to assert next character is $
Comments