MalumAtire832 MalumAtire832 - 6 months ago 28
SQL Question

MySQL Matching whitespace in Regex

I'm having a problem with the regex syntax in Mysql when it comes to matching whitespaces in Strings.

I have a database of Zipcodes in the format:

1111 AA CITYNAME
or
1111 CITYNAME
.

From this, I want to extract the zipcode and the cityname, I used the following code:

DROP FUNCTION IF EXISTS GET_POSTALCODE;
CREATE FUNCTION GET_POSTALCODE(input VARCHAR(255))
RETURNS VARCHAR(255)
BEGIN
DECLARE output VARCHAR(255) DEFAULT '';
IF input LIKE '^[1-9][0-9]{3}[[:blank:]][A-Z]{2}[[:blank:]]%'
THEN
SET output = SUBSTRING(input, 1, 7);
ELSE
SET output = SUBSTRING(input, 1, 4);
END IF;
RETURN output;
END


I would expect the result for a input string of
9741 NE Groningen
to be split up into
9741 NE
and
Groningen
.

But instead I get
9741
and
NE Groningen
.

I have tried all sorts of things to match the whitespace, which I think is the problem. I tried:


  • [[:blank:]]

  • [:blank:]

  • [[:space:]]

  • [:space:]

  • and the
    \s
    method



[:space:]
Should match all whitespaces, but yet again, same result.

Nothing I try seems to work, could you maybe point me in the right direction?

Thank you!

Answer

Foreward

You can enable PCRE if you use this library

Description

^([0-9]{4}(?:[[:blank:]]+[a-z]{2}(?=[[:blank:]]))?)[[:blank:]](.*$)

Regular expression visualization

This regular expression will do the following:

  • find the 4 digit codes followed by an optional two characters
  • match the rest of the string which should be a city name

Example

Live Demo

https://regex101.com/r/sE3xN7/4

Sample text

Note your examples only had 4 digit codes, so I took the liberty of adding an additional digit

1111 AA CITYNAME1
2222 CITYNAME2
3333 Las Vegas
4444 BB Las Vegas
9741 NE Groningen

Sample Matches

MATCH 1
1.  [0-7]   `1111 AA`
2.  [8-17]  `CITYNAME1`

MATCH 2
1.  [18-22] `2222`
2.  [23-32] `CITYNAME2`

MATCH 3
1.  [33-37] `3333`
2.  [38-47] `Las Vegas`

MATCH 4
1.  [48-55] `4444 BB`
2.  [56-65] `Las Vegas`

MATCH 5
1.  [66-73] `9741 NE`
2.  [74-83] `Groningen`

Explanation

NODE                     EXPLANATION
----------------------------------------------------------------------
  ^                        the beginning of a "line"
----------------------------------------------------------------------
  (                        group and capture to \1:
----------------------------------------------------------------------
    [0-9]{4}                 any character of: '0' to '9' (4 times)
----------------------------------------------------------------------
    (?:                      group, but do not capture (optional
                             (matching the most amount possible)):
----------------------------------------------------------------------
      [[:blank:]]+             whitespace (\n, \r, \t, \f, and " ")
                               (1 or more times (matching the most
                               amount possible))
----------------------------------------------------------------------
      [a-z]{2}                 any character of: 'a' to 'z' (2 times)
----------------------------------------------------------------------
      (?=                      look ahead to see if there is:
----------------------------------------------------------------------
        [[:blank:]]               whitespace (\n, \r, \t, \f, and " ")
----------------------------------------------------------------------
      )                        end of look-ahead
----------------------------------------------------------------------
    )?                       end of grouping
----------------------------------------------------------------------
  )                        end of \1
----------------------------------------------------------------------
  [[:blank:]]                 whitespace (\n, \r, \t, \f, and " ")
----------------------------------------------------------------------
  (                        group and capture to \2:
----------------------------------------------------------------------
    .*                       any character except \n (0 or more times
                             (matching the most amount possible))
----------------------------------------------------------------------
    $                        before an optional \n, and the end of a
                             "line"
----------------------------------------------------------------------
  )                        end of \2
----------------------------------------------------------------------