user1610350 user1610350 - 6 months ago 27
MySQL Question

MySQL Regular Expression to match multiple CSV rows containing certain fields in order

I'd like to use MySQL's

REGEXP
to match multiple numbers of a csv in a MySQL query.

I am trying to identify if a CSV string contains numbers 2 and 9. The order matters in the result. They could be back to back, and be present at the beginning and/or end.

The below CSV strings should all produce positive result:


1,2,3,4,5,6,7,8,9,10

2,9,1,2,3,4,5,10

1,2,3,5,9


These CSV strings should not:


9,2,3,4,5,10 - (2 doesn't exist before 9)

2,1,2,3,4,5,10 - (9 not present)



I've tried to match what I am expecting in the pattern by the following logic:


  1. matching anything or nothing

  2. match the number 2 at least one time

  3. matching anything or nothing

  4. match the number 9 at least one time

  5. matching anything or nothing



My expression that is close, but not working is:

REGEXP '.*([^0-9][2][^0-9])+.*([^0-9][9][^0-9])+.*'


The above expression fails to match if 2 is the very beginning or 9 the very end of the string. Thanks for the input.

Answer

MySQL

Since we're using MySQL REGEXP, we can take this approach:

SELECT * FROM table WHERE field REGEXP '[[:<:]]2[[:>:]].*[[:<:]]9[[:>:]]'

Assuming we have only one line of CSV in each row, this will match:

1,2,3,4,5,6,7,8,9,10
2,9,1,2,3,4,5,10
1,2,3,5,9

And not match:

9,2,3,4,5,10
2,1,2,3,4,5,10
20,9,1,2,3,4,5,10
2,19,1,2,3,4,5,10

In MySQL, [[:<:]] and [[:>:]] match the beginning and ending of a "word", and a , is not "part of a word" (but two numbers next to each other are considered a "word" still).

For example:

mysql> SELECT * FROM test WHERE csv REGEXP '[[:<:]]2[[:>:]].*[[:<:]]9[[:>:]]';
+----+----------------------+
| id | csv                  |
+----+----------------------+
|  1 | 1,2,3,4,5,6,7,8,9,10 |
+----+----------------------+
1 row in set (0.00 sec)

PCRE

I had originally thought this was a PCRE question, but it was a MySQL REGEXP question! However, in case someone finds it useful, I'll leave this information about PCRE here.

This regex is basically equivalent to the MySQL REGEXP above:

^.*\b2\b.*\b9\b.*$

Using the link above can help you visualize the match.

\b is a "word boundary" (and means basically the same thing as [[:<:]] or [[:>:]] in MySQL), preventing us from matching digits that are part of other numbers.

Note, if you're trying to match the entire multi-line block of text at once, use the m PCRE modifier (PCRE_MULTILINE flag) so that ^ and $ anchor at the beginning and end of each line, rather than the whole string.

So, in PHP, we'd use:

preg_match('/^.*\b2\b.*\b9\b.*$/', $csvRow);

Or:

preg_match('/^.*\b2\b.*\b9\b.*$/mg', $wholeCsvFile);
Comments