Dan Dan - 3 months ago 16
Python Question

Regex to ignore pattern found in quotes (Python or R)

I am trying to create a regex that allows me to find instances of a string where I have an unspaced



some characters/morecharacters

I have come up with the expression below which allows me to find word characters or closing parenthesis before my
and word characters or open parenthesis characters afterwards.


This works great for most situations, however I am coming unstuck when I have a
enclosed in quotes. In this case I'd like it to be ignored. I have seen a few different posts here and here. However, I can't quite get them to work in my situation.

What I'd like is for first three cases identified below to match and the last cast to be ignored allowing me to extract item 1 and item 3.

some text/more text
"dont match/me"


It ain't pretty, but this will do what you want:


Demo on Regex101

Let's break it down a bit:

  • (?<!")(?:\(|\b) will match either an open bracket or a word boundary, as long as it's not preceded by a quotation mark. It does this by employing a negative lookbehind.
  • [^"\n]+ will match one or more characters, as long as they're neither a quotation mark or a line break (\n).
  • \/ will match a literal slash character.
  • Finally, (?:\)|\b)(?!") will match either a closing bracket or a word boundary as long as it's not followed by a quotation mark. It does this by employing a negative lookahead. Note that the (?:\)|\b) will only work 100% correctly in this order - if you reverse them, it'll drop the match on the bracket, because it encounters a word boundary before it gets to the bracket.