Graphth Graphth - 2 months ago 21
SQL Question

Teradata regular expressions, look behind

I have a field, Simplified_Description and I'm looking for patterns in it. Specifically, I'm looking for a pattern like 6 X 8 or 6X8 or 600X800. I want to pull out the first and second numbers into new fields. I've been able to get the first number (with much help) using a look-ahead.

REGEXP_substr(Simplified_Description, '[0-9]+(?= {0,1}[X] {0,1}[0-9]+)') AS FirstNum,

When I try to get the second number by changing the look-ahead to a look-behind (by simply adding in a "<"),

REGEXP_substr(Simplified_Description, '[0-9]+(?<= {0,1}[X] {0,1}[0-9]+)') AS SecondNum

I now get an error

SELECT Failed. [9134] The pattern specified is not a valid pattern.

I am a complete newb on regular expressions, especially on look-ahead and look-behind, so it's possible I have some extremely simple error, but I can't figure it out as what I'm doing appears to be the correct syntax.

Answer Source

You may use the following regex to extract the first number:

REGEXP_substr(Simplified_Description, '\d+(?=\s*X\s*\d)') AS FirstNum

and this regex for the second number:

REGEXP_substr(Simplified_Description, '\d+\s*X\s*\K\d+') AS SecondNum

See the regex 1 and regex 2 demo.

Patter 1 details

  • \d+ - 1 or more digits that are followed with...
  • (?=\s*X\s*\d) - a sequence of patterns:
    • \s* - 0+ whitespaces
    • X - an X char
    • \s* - 0+ whitespaces
    • \d - a digit.

Pattern 2 details

  • \d+ - 1 or more digits
  • \s*X\s* - an X char enclosed with any 0+ whitespace chars
  • \K - a match reset operator that omits (removes) the text matched so far from the match value
  • \d+ - 1 or more digits.