Mark Heckmann Mark Heckmann - 1 month ago 5
SQL Question

extract, match and combine information from string into array in PostgreSQL

Problem

(sorry, had problems, cutting it into smaller pieces):

I have the following one column table

A


s
-------------------------------------------
(AAB) Some name1 1234 (XY*) Hello world 12
(BB) Just one 123


and a lookup table
L


a1 | a2
-----+-----
XY* | XY
AAB | A2B


I want to :


  1. retrieve the parts in the brackets:
    (AAB, XY*)

  2. get the corresponding
    a2
    entry from the lookup table:
    (A2B, XY)

  3. retrieve the number part of the entries:
    (1234, 12)

  4. combine the results from 2 and 4 into an array:
    (A2B1234, XY12)



The results would be

s
-------------------------------------------
{A2B1234, XY12}
{BB123}





What I tried

This is how far I got:


  • For 1:

    SELECT array_to_string(regexp_matches(s, '\((.*?)\)', 'g'), '') as in_bracket
    FROM A;

  • For 3:

    SELECT array_to_string(regexp_matches(s, '(\d+)', 'g'), '') as numbers
    FROM A;



Here the struggle begins. How can I


  1. Replace the values (
    in_bracket
    ) with the lookup values on the fly?

  2. Combine the results from the two select clauses into one array per row?






The data:

CREATE TABLE A (
s VARCHAR
);
INSERT INTO a VALUES
('(AAB) Some name1 1234 (BB) More text 99 (XY*) Hello world 12'),
('(BB) Just one 123');

CREATE TABLE L (
a1 VARCHAR(4),
a2 VARCHAR(4)
);
INSERT INTO L VALUES
('XY*', 'XY'),
('AAB', 'A2B');

Answer

I came to this form:

WITH parts AS (
    SELECT
        id,
        (regexp_matches(s, '\(([^\)]+)\)[^0-9]*([0-9]+)', 'g'))[1] AS search,
        (regexp_matches(s, '\(([^\)]+)\)[^0-9]*([0-9]+)', 'g'))[2] AS number
    FROM
        A
)
SELECT
    array_agg(L.a2 || parts.number)
FROM
    parts
    JOIN L
    ON (L.a1 = parts.search)
GROUP BY
    parts.id;

The output is:

{A2B1,XY12}
{BB123}

There are some thing that needs to be clarified:

  • what exactly is "number part of the entries" - in given example there is "name1", so my regex matches the "1" - note that A2B1 is returned instead of A2B1234
  • in table L there is no entry for BB - I have added it manually to my test L table - should just text in brackets be used in such situation?
  • I needed to group result from table A, and I have assumed that there is id column - you can also use s column for that provided that it is quniue

You may need to adjust the regex I've used. Here is an explanation to my version:

\(         - '('
([^\)]+)   - match of sequence of any char except ')' - this will be for example `AAB` or `XY*`
\)         - ')'
[^0-9]*    - any sequence of chars except digits (no-matching group)
([0-9]+)   - match of non-empty sequence of digits
Comments