AlexStarbuck AlexStarbuck - 1 year ago 48
Javascript Question

How to match sth. "preceeded / followed by" something with regular expression?

I am working with a Google Sheets document in which I need to manipulate strings and extract certain parts of them. These strings have exactly the following form, to the character:

Ad name:; 970x250

I need to extract two "fields":

  • Zagreb

  • 970x250

Obviously, the first one is always surrounded by
which makes things a bit easier and the other one is either surrounded by
OR preceded by
"; "
if I were to capture it from the end of the string.

I am trying to use Google Sheets proprietary
formula (read more about it here) but I must be doing something wrong. If it matters, Google products use RE2 RegEx "flavor".

Here is what I have so far:

=REGEXEXTRACT(text, "(?:_)[A-Za-z]+(?:\+).*")

This one returns:


so I need to lose the "_" and "+". I understand that for this type of operation (extracting text between certain characters) look-arounds should be used but I am still quite unfamiliar with these. Also, I understand that some of them (negative look-behind most notably) do not work with JavaScript.

This is attempt 2:

=REGEXEXTRACT(text, ".*[A-Za-z]+(?=\+.*)")

This one just throws a
error. I find these two resources invaluable for learning RegEx:

but since I am working on a real assignment and am on a tight deadline, I can't afford to study this in detail right now.

Any help appreciated!

Answer Source

In Google Speadsheets, you may use a capturing group around the piece of text you need to extract from a specific context. Thus, just place ( and ) around those pattern parts.

To get Zagreb, use =REGEXEXTRACT(F15,"_([a-zA-Z]+)\+") and to get the resolution, use =REGEXEXTRACT(F15,";\s*([0-9x]+)$").

Pattern 1:

  • _ - an underscore that is just matched
  • ([a-zA-Z]+) - Capture group 1 matching one or more ASCII letters
  • \+ - a literal +.

Pattern 2

  • ;\s* - a ; and 0+ whitespaces
  • ([0-9x]+) - Capture group 1 matching one or more digits or x
  • $ - at the end of the cell contents.

In both cases, you only get the substrings captured into Group 1.

enter image description here

More information about capturing groups can be found here.