Stephen Tableau Stephen Tableau - 26 days ago 15
SQL Question

How to write SQL query that selects distinct pair values for specific criteria?

I'm having trouble formulating a query for the following problem:

For pair values that have a certain score, how do you group them in way that will only return distinct pair values with the best respective scores?

For example, lets say I have a table with the following row values:

(t1,p1,65)
(t1,p2,60)
(t1,p3,20)
(t2,p1,60)
(t2,p2,59)
(t2,p3,15)


The first two columns indicate the pair values and the third column represents the pair score.The best score is
(t1,p1,65)
. Since t1 and p1 are now used, I want to exclude them from further analysis.

The next best score is
(t2,p2,59)
. Even though
(t1,p2)
has a score of 60, I want to exclude it because "t1" has already been used. (
t2,p1)
also has a score of 60, but since p1 is also already used, this pair is excluded.

This results in the distinct pair score values of:

(t1,p1,65)
(t2,p2,59)


Is there any way to generate this result with just a query? I've tried to think of ways of grouping and partitioning the results, but since there has to be some accounting of values already used according to score rank, I'm finding this very difficult to approach.

EDIT:

To generate the data:

with t(t, p, score) as (
(values ('t1','p1',65),
('t1','p2',60),
('t1','p3',20),
('t2','p1',60),
('t2','p2',59),
('t2','p3',15)
))
select t.* from t;

Answer

It is relatively simple using the stored function:

--drop function if exists f();
--drop table if exists t;
create table t(x text,y text, z int);
insert into t values
  ('t1','p1',65),
  ('t1','p2',60),
  ('t1','p3',20),
  ('t2','p1',60),
  ('t2','p2',59),
  ('t2','p3',15)/*,
  ('t3','p1',20),
  ('t3','p2',60),
  ('t3','p3',40)*/;

create function f() returns setof t immutable language plpgsql as $$
declare
  ax text[];
  ay text[];
  r t;
begin
  ax := '{}'; ay := '{}';
  loop
    select * into r
      from t
      where x <> all(ax) and y <> all(ay)
      order by z desc, x, y limit 1;
    exit when not found;
    ax := ax || r.x; ay := ay || r.y;
    return next r;
  end loop;
end $$;

select * from f();
╔════╤════╤════╗
║ x  │ y  │ z  ║
╠════╪════╪════╣
║ t1 │ p1 │ 65 ║
║ t2 │ p2 │ 59 ║
╚════╧════╧════╝

However if uncomment the third bunch of the values the result will be different:

╔════╤════╤════╗
║ x  │ y  │ z  ║
╠════╪════╪════╣
║ t1 │ p1 │ 65 ║
║ t3 │ p2 │ 60 ║
║ t2 │ p3 │ 15 ║
╚════╧════╧════╝

Upd: and the equivalent using recursive CTE on the same test data:

with recursive r as (
  (select x, y, z, array[x] as ax, array[y] as ay from t order by z desc, x, y limit 1)
  union all
  (select t.x, t.y, t.z, r.ax || t.x, r.ay || t.y from t, r 
  where not (t.x = any(r.ax) or t.y = any(r.ay)) 
  order by t.z desc, t.x, t.y limit 1))
select * from r;