GiuseppeP GiuseppeP - 2 months ago 7
SQL Question

SQL show duplicates after join

I'm trying to select all duplicated (removing the distinct record) entry of a table, result of a join.

I'm trying to do it with:

SELECT items.id, b.title FROM b WHERE b.title IN (
SELECT b.title FROM b LIMIT 20 GROUP BY b.title HAVING COUNT(*)>1
) INNER JOIN items USING (number)


Note:
number
is a Index common to 'items' and 'b' tables used for the join.

But obvious that doesn't work. I'm trying to understand how to get the duplicates and return the join result.

An example of output should be:

id | title
----|----
001 | House
002 | House
005 | Tree
010 | Tree
006 | Car
007 | Car

Answer

Using COUNT(*) in a subquery incurs overhead. I would suggest:

SELECT i.id, b.title
FROM b JOIN
     items i
     USING (number)
WHERE EXISTS (SELECT 1
              FROM b b2
              WHERE b2.title = b.title AND b2.id <> b.id
             );

This does assume assume that b has a unique id on each row, which is used in the subquery. This query can make use of an index on b(title, id).