Ryan Barker Ryan Barker - 3 months ago 9
SQL Question

Oracle SQL Group By Column Value Rank

I have recently started to learn database queries in Oracle, but there is one thing I am having a lot of trouble understanding about grouping.

It is best explained with example. Say I have source data that looks like this:

MY_SOURCE

ID | Fruit
----------
1 | Orange
1 | Apple
1 | Orange
2 | Banana
2 | Apple
3 | Apple
3 | Apple
3 | Orange


Suppose certain values of fruit are have different rank/priority. Say the rank (1=most important; 3=least important) is:

Fruit Rank:
1. Banana
2. Orange
3. Apple


I would like to group data by ID. When I do so, there will be duplicates for the fruit column for each ID in the data source which have to be aggregated. For instance, for ID one, the set of possible values is:

Orange, Apple, Orange


At this point, instead of aggregating the duplicate fruit sets with something like stats_mode(), I would like to group the data by fruit rank and only display the value with the highest rank. So, the output would be:

ID | Fruit
----------
1 | Orange
2 | Banana
3 | Orange


Is there a way to accomplish this in SQL?

I am imagining a query that looks something like:

SELECT DISTINCT
ID,
MAGIC_MAX_RANK_FUNCTION(FRUIT, ['Banana','Orange', 'Apple'])
FROM
MY_SOURCE


In technical terms, we are grouping the duplicate fruit data according to a rank specified by the programmer for all of that column's possible values.

Thank you in advance!

Answer

Instead of GROUP BY, I would just use ROW_NUMBER():

select s.*
from (select s.*,
             row_number() over (partition by id
                                order by (case fruit when 'banana' then 1 when 'orange' then 2 when 'apple' then 3 else 999 end)
                               ) as seqnum
      from my_source s
     ) s
where seqnum = 1;

Another method uses union all and is best for short lists:

select s.*
from my_source s
where s.fruit = 'banana'
union all
select s.*
from my_source s
where s.fruit = 'orange' and
      not exists (select 1 from my_source s2 where s2.id = s.id and s2.fruit in ('banana'))
union all
select s.*
from my_source s
where s.fruit = 'apple' and
      not exists (select 1 from my_source s2 where s2.id = s.id and s2.fruit in ('banana', 'orange'));

Under some circumstances, this method might be faster than the previous method (and vice versa).