Dan Dan - 24 days ago 6
Python Question

Regex to ignore pattern found in quotes (Python or R)

I am trying to create a regex that allows me to find instances of a string where I have an unspaced

/

eg:

some characters/morecharacters


I have come up with the expression below which allows me to find word characters or closing parenthesis before my
/
and word characters or open parenthesis characters afterwards.

(\w|\))/(\(|\w)


This works great for most situations, however I am coming unstuck when I have a
/
enclosed in quotes. In this case I'd like it to be ignored. I have seen a few different posts here and here. However, I can't quite get them to work in my situation.

What I'd like is for first three cases identified below to match and the last cast to be ignored allowing me to extract item 1 and item 3.

some text/more text
(formula)/dividethis
divideme/(byme)
"dont match/me"

Answer

It ain't pretty, but this will do what you want:

(?<!")(?:\(|\b)[^"\n]+\/[^"\n]+(?:\)|\b)(?!")

Demo on Regex101

Let's break it down a bit:

  • (?<!")(?:\(|\b) will match either an open bracket or a word boundary, as long as it's not preceded by a quotation mark. It does this by employing a negative lookbehind.
  • [^"\n]+ will match one or more characters, as long as they're neither a quotation mark or a line break (\n).
  • \/ will match a literal slash character.
  • Finally, (?:\)|\b)(?!") will match either a closing bracket or a word boundary as long as it's not followed by a quotation mark. It does this by employing a negative lookahead. Note that the (?:\)|\b) will only work 100% correctly in this order - if you reverse them, it'll drop the match on the bracket, because it encounters a word boundary before it gets to the bracket.