m m m m - 6 months ago 67
SQL Question

Extract substring with a specific pattern in MySql

I have a text field which looks like:

option[A]sum[A]g3et[B]


I want to get the text which is inside the
[ ]
without duplicates.
Meaning to get:

A
B


There can't be a case of double like
[ [ ] ]
.

I know this is a horrible way to save data in databases. I can not change how the data is saved. I just need to get a very specific (one time) information from this column.

I tried to do:

SELECT substring_index(substring_index(sentence, '[', -1),']', 1)
FROM (SELECT 'THIS[A] IS A TEST' AS sentence) temp;


This gives me
A
, but it will not work for many
[]
.

I thought of using regex however I don't know how many
[ ]
I have.

How do I do that?

Answer

It is not job for DB but it is possible:

CREATE TABLE tab(id INT, col VARCHAR(100));           
INSERT INTO tab(id, col) 
VALUES (1, 'option[A]sum[A]g3et[B]'), (2, '[Cosi]sum[A]g3et[ZZZZ]');      

SELECT DISTINCT *
FROM (
  SELECT id, RIGHT(val, LENGTH(val) - LOCATE('[', val)) AS val
  FROM
  (
    SELECT id, SUBSTRING_INDEX(SUBSTRING_INDEX(t.col, ']', n.n), ']', -1) AS val
    FROM tab t 
    CROSS JOIN 
    (
     SELECT a.N + b.N * 10 + 1 n
       FROM 
      (SELECT 0 AS N UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9) a
      ,(SELECT 0 AS N UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9) b
    ) n
    WHERE n.n <= 1 + (LENGTH(t.col) - LENGTH(REPLACE(t.col, ']', '')))
  ) sub
) s
WHERE val <> ''
ORDER BY ID;

SqlFiddleDemo

Note:

Depending on col maximum length you may need to generate more numbers in CROSS JOIN section. For now it is up to 100.

Output:

enter image description here

How it works:

  1. Generate number table with CROSS JOIN
  2. Split string based on ] as delimeter
  3. RIGHT(val, LENGTH(val) - LOCATE('[', val)) remove the part up to [
  4. filter out empty records
  5. Get only DISTINCT values

Inner most query:

╔════╦══════════╗
║ id ║   val    ║
╠════╬══════════╣
║  1 ║ option[A ║
║  1 ║ sum[A    ║
║  1 ║ g3et[B   ║
║  1 ║          ║
╚════╩══════════╝

Second subquery:

╔════╦═════╗
║ id ║ val ║
╠════╬═════╣
║  1 ║ A   ║
║  1 ║ A   ║
║  1 ║ B   ║
║  1 ║     ║
╚════╩═════╝

And outermost query:

╔════╦═════╗
║ id ║ val ║
╠════╬═════╣
║  1 ║ A   ║
║  1 ║ B   ║
╚════╩═════╝

I need the result of query per row.. not combined

So add simple:

WHERE n.n <= 1 + (LENGTH(t.col) - LENGTH(REPLACE(t.col, ']', '')))
  AND t.id = ?

EDIT 2:

see http://sqlfiddle.com/#!9/8ee95/1 your query works partially for my data. I also changed the type to longtext.

You want to parse JSON in MySQL. As I said before parse and get value in application layer. This answer is only for demo/toys purpose and will have very low performamce.

If you still insist on SQL solution:

SELECT id, val,s.n
FROM (
  SELECT id, RIGHT(val, LENGTH(val) - LOCATE('[', val)) AS val,n
  FROM
  (
    SELECT id, SUBSTRING_INDEX(SUBSTRING_INDEX(t.col, ']', n.n), ']', -1) AS val, n.n
    FROM (SELECT id, REPLACE(col, '[]','') as col FROM tab) t
    CROSS JOIN 
    (
     SELECT e.N * 10000 + d.N * 1000 + c.N * 100 + a.N + b.N * 10 + 1 n
       FROM 
      (SELECT 0 AS N UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9) a
      ,(SELECT 0 AS N UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9) b
      ,(SELECT 0 AS N UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9) c
      ,(SELECT 0 AS N UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9) d
      ,(SELECT 0 AS N UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9) e

    ) n
    WHERE n.n <= 1 + (LENGTH(t.col) - LENGTH(REPLACE(t.col, ']', '')))
  ) sub
) s
WHERE val <> ''
GROUP BY id, val
HAVING n <> MAX(n)
ORDER BY id,n;

SqlFiddleDemo

Output:

╔═════╦═════════════╦════╗
║ id  ║    val      ║ n  ║
╠═════╬═════════════╬════╣
║  1  ║ CE31285LV4  ║  1 ║
║  1  ║ D32E        ║  3 ║
║  1  ║ GTX750      ║  5 ║
║  1  ║ M256S       ║  7 ║
║  1  ║ H2X1T       ║  9 ║
║  1  ║ FMLANE4U4   ║ 11 ║
╚═════╩═════════════╩════╝

EDIT 3:

What exactly is done there? Why do you need n

CROSS JOIN and entire subquery is only tally table. That'is all. If MySQL has function to generate number sequence (like generate_series or prepopulated number table there is no need for CROSS JOIN.

Number table is needed for SUBSTRING_INDEX:

SUBSTRING_INDEX(str,delim,count)

Returns the substring from string str before count occurrences of the delimiter delim. If count is positive, everything to the left of the final delimiter (counting from the left) is returned. If count is negative, everything to the right of the final delimiter (counting from the right) is returned. SUBSTRING_INDEX() performs a case-sensitive match when searching for delim.

Comments