Christian Christian - 1 month ago 9
SQL Question

Group by bucket (with NULL values)

I have the following tables:


  1. entries (id, title, text, duplicate_bucket_id)

  2. duplicate_buckets (id, comment)



So every entry can be in a duplicate bucket. Now I want to get all entries without the duplicates:

SELECT MIN(id) FROM entries GROUP BY duplicate_bucket_id


The problem with this query is that it also groups all the entries without a duplicate_bucket_id to only one entry with NULL.

So I need something like this

(SELECT MIN(id) FROM entries WHERE duplicate_bucket_id IS NOT NULL GROUP BY duplicate_bucket_id)
UNION
(SELECT id FROM entries WHERE duplicate_bucket_id IS NULL)


This query gives me the correct result, but ActiveRecord can't use UNIONs.

Alternatively, I can use this query with a subquery:

SELECT * FROM entries WHERE duplicate_bucket_id IS NULL OR id IN
(SELECT MIN(id) FROM entries WHERE duplicate_bucket_id IS NOT NULL GROUP BY duplicate_bucket_id )


In this query, I must place additional where-clauses in AND outside of the subquery. So the query gets quite complicated and I don't know yet, how to use the Ransack Gem with such a query...

The query would be simple, if every "entry" would be in a "duplicate_bucket" - buckets of size 1 (I could use *SELECT * FROM entries GROUP BY duplicate_bucket_id*). But I want to avoid to have entries in a duplicate_bucket, if the entry don't have a duplicate. Is there a simple query (no unions, no subqueries) to get all entries without their duplicates?

Dataset



entries(id, title, text, duplicate_bucket_id)

1, 'My title', 'Bla bla', 1
2, 'Hello', 'Jaha', 1
3, 'Test', 'Bla bla', 1
4, 'Foo', 'Bla', NULL
5, 'Bar1', '', 2
6, 'Bar2', '', 2


duplicate_buckets (id, comment)

1, 'This bucket has 3 entries'
2, 'Bar1 and Bar2 are duplicates!'


Result



1, 'My title', 'Bla bla', 1
4, 'Foo', 'Bla', NULL
5, 'Bar1', '', 2

Answer

ANSI/ISO SQL:

select      * 
from        entries as e1
where       not exists (select null from entries as e2 where e2.duplicate_bucket_id = e1.duplicate_bucket_id and e2.id < e1.id)
;

MySQL Terrible, Horrible, No Good, Very Bad syntax

select      * 
from        entries 
group by    coalesce(duplicate_bucket_id,id)
;
Comments