user1483799 user1483799 - 3 months ago 7
MySQL Question

Writing stored procedure which flags duplicate values in a comma separated field in MySQL

I have a database table like this sample:

ID THINGS HAS_DUPLICATES
1 AAA, BBB, AAA NULL
2 CCC, DDD NULL


I am trying to write a stored procedure to flag duplicate values in THINGS field.
After calling the procedure the table will become like this:

ID THINGS HAS_DUPLICATES
1 AAA, BBB, AAA YES
2 CCC, DDD NO


Please be informed that I am trying to resolve it using only SQL and without normalizing my database. I am also aware of other approaches like writing PHP code.

Answer

Here is the answer to my question, assuming the data in THINGS field are separated by a bar '|'. Our original table will be myTABLE:

ID   THINGS            THINGSCount  THINGSCountUnique   HAS_DUPLICATES
1    AAA|BBB|AAA       NULL         NULL                NULL
2    CCC|DDD           NULL         NULL                NULL

Step 1. Check the maximum number of values separated by a bar '|' in THINGS field:

SELECT ROUND((CHAR_LENGTH(THINGS) - CHAR_LENGTH(REPLACE(THINGS,'|',''))) / CHAR_LENGTH('|')) + 1 FROM myTABLE;

Step 2. Assuming the answer from step 1 was 7, now use the following SQL to split the data in THINGS field into rows, there are many other approaches which you can Google to do the split:

CREATE TABLE myTABLE_temp 
SELECT ID, SUBSTRING_INDEX(SUBSTRING_INDEX(myTABLE.THINGS, '|', n.n), '|', -1) THINGS 
FROM myTABLE JOIN 
( SELECT n FROM 
( SELECT 1 AS N UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7 ) a ) n 
ON CHAR_LENGTH(THINGS) - CHAR_LENGTH(REPLACE(THINGS, '|', '')) >= n - 1 
ORDER BY ID;

Our myTABLE_temp table will be something like:

ID   THINGS
1    AAA
1    BBB
1    AAA
2    CCC
2    DDD

Step 3. Here we create two new tables to hold COUNT(THINGS) and COUNT(DISTINCT THINGS) as following:

# THINGSCount
CREATE TABLE myTABLE_temp_2 
SELECT ID, COUNT(THINGS) AS THINGSCount FROM myTABLE_temp GROUP BY ID;
# Remember to ADD INDEX to ID field
UPDATE myTABLE A INNER JOIN myTABLE_temp_2 B ON(A.ID = B.ID) SET A.THINGSCount = B.THINGSCount;

# THINGSCountUnique
CREATE TABLE myTABLE_temp_3 
SELECT ID, COUNT(THINGS) AS THINGSCountUnique FROM myTABLE_temp GROUP BY ID;
# Remember to ADD INDEX to ID field
UPDATE myTABLE A INNER JOIN myTABLE_temp_3 B ON(A.ID = B.ID) SET A.THINGSCountUnique = B.THINGSCountUnique;

Final Step: Flag duplicate values:

UPDATE myTABLE SET HAS_DUPLICATES = IF(THINGSCount>THINGSCountUnique, 'DUPLICATES', 'NO');