shuvrow shuvrow - 7 months ago 26
SQL Question

How to use regexp on the results of a sub query?

I have two tables.

User
which has id and phone number




id phone_no

1 ---- 9912678

2 ---- 9912323

3 ---- 9912366




Admission Table , which has id phone number

id phone_no

6 --- 991267823

7 --- 991236621

8 --- 435443455

9 --- 243344333

I want to find all the phone number of Admission's table which has same pattern as users table.

So i am trying this

select phone_no from admission where phone_no REGEXP (SELECT phone_no
FROM `users` AS user
WHERE user.phone_no REGEXP '^(99)+[0-9]{8}')


But i am getting this error Subquery returns more than 1 row

Looking for help.

Answer

Try one of these queries:

SELECT a.phone_no
FROM admission a
JOIN users u on a.phone_no LIKE concat(u.phone_no, '__')
WHERE u.phone_no REGEXP  '^[99]+[0-9]+$'

or

SELECT a.phone_no
FROM admission a
JOIN users u on a.phone_no REGEXP concat('^', u.phone_no, '[0-9]{2}$')
WHERE u.phone_no REGEXP  '^[99]+[0-9]+$'

If the number of "trailing digits" is not fixed, you can also use:

LIKE concat(u.phone_no, '%')

or

REGEXP concat('^', u.phone_no, '[0-9]*$')

But in this case you might need to use SELECT DISTICT a.phone_no if it is possible that a users.phone_no is a subsequence of an other user users.phone_no (e.g. 99123 and 991234).

Update

After running some tests with 10K rows for users table and 100K rows for admission table i came to the following query:

SELECT a.phone_no
FROM admission a
JOIN users u 
    ON  a.phone_no >= u.phone_no
    AND a.phone_no < CONCAT(u.phone_no, 'z')
    AND a.phone_no LIKE CONCAT(u.phone_no, '%')
    AND a.phone_no REGEXP CONCAT('^', u.phone_no, '[0-9]*$')
WHERE   u.phone_no LIKE  '99%'
    AND u.phone_no REGEXP  '^[99]+[0-9]*$'
UNION SELECT 0 FROM (SELECT 0) dummy WHERE 0

fiddle

This way you can use REGEXP and still have great performance. This query executes almost instantly in my test case.

Logically you only need the REGEXP conditions. But on bigger tables the query might time out. Using a LIKE condition will filter the result set before REGEXP check. But even using LIKE the query doesn't perform very well. For some reason MySQL doesn't use a range check for the join. So i added an explicit range check:

    ON  a.phone_no >= u.phone_no
    AND a.phone_no < CONCAT(u.phone_no, 'z')

With this check you can remove the LIKE condition from the JOIN part.

The UNION part is a replacement for DISTICT. MySQL seems to translate DISTINCT into a GROUP BY statement, which doesn't perform well. Using UNION with an empty result set i force MySQL to remove duplicates after the SELECT. You can remove that line, if you use a fixed number of trailing digits.

You can adjust the REGEXP patterns to your needs:

...
    AND a.phone_no REGEXP CONCAT('^', u.phone_no, '[0-9]{2}$')
...
    AND u.phone_no REGEXP  '^[99]+[0-9]{8}$'
...

If you only need REGEXP to check the length of the phone_no, you can also use a LIKE condition with the '_' placeholder.

    AND a.phone_no LIKE CONCAT(u.phone_no, '__')
...
    AND u.phone_no LIKE '99________$'

or combine a LIKE condition with a STR_LENGTH check.