AlienDeg AlienDeg - 3 months ago 13
SQL Question

Replacing URL with part of that URL in Big Query

I'm trying to write SQL in big query to replace strings below with middle part of the string

sports.xxxx.com/en-gb/betting/football/vasdas


sports.xxxx.com/en-gb/betting/basketball/blablabla


sports.xxxx.com/en-gb/betting/football/


sports.xxxx.com/en-gb/betting/golf


So result would be

football


basketball


football


golf


I tried to use
SELECT
REGEXP_REPLACE('sports.xxxx.com/en-gb/betting/football', '.*', '(?<=\b\/betting\/)(\w+)')

but I get an error
Error: Invalid string literal: '(?<=\b\/betting\/)(\w+)'

Any ideas how I can make it?

Answer

The error is due to the single backslashes you used in the string literal that formed invalid escape seuqences.

You need a capturing group with REGEX_EXTRACT:

SELECT REGEXP_EXTRACT('spor‌​‌​ts.ladbrokes.com/en‌​-g‌​b/betting/footbal‌​l', r'\b/betting/(\w‌​+)‌​')

See the regex demo

Note also that Big Query regex syntex is powered by the RE2 regex library that does not support lookarounds at all (and you tried to use a positive loobehind (?<=....)).

The pattern details:

  • \b - a word boundary (trailing). It might be unnecessary, you may remove it safely
  • /betting/ - a literal substring /betting/ (note the forward slashes are not special for regex, they need no escaping)
  • (\w‌​+)‌ - Group 1, the part of the string that will be returned by the REGEX_EXTRACT, capturing 1 or more word chars (letters, digits or underscores).
Comments