Jeremy Jeremy - 2 years ago 130
SQL Question

Update PostgreSQL table with values from self

I am attempting to update multiple columns on a table with values from another row in the same table:

pid INT
,name VARCHAR(40)
,dob DATE
,younger_sibling_name VARCHAR(40)
,younger_sibling_dob DATE

INSERT INTO person VALUES (pid, name, dob)
(1, 'John', '1980-01-05'),
(2, 'Jimmy', '1975-04-25'),
(3, 'Sarah', '2004-02-10'),
(4, 'Frank', '1934-12-12');

The task is to populate
with the name and birth day of the person that is closest to them in age, but not older or the same age.

I can set the younger sibling
easily because this is the value that determines the record to use with a correlated subquery (I think this is an example of that?):

UPDATE person SET younger_sibling_dob=(
FROM person AS sibling
WHERE sibling.dob < person.dob);

I just can't see any way to get the

The real query of this will run over about 1M records in groups of 100-500 for each MAX selection so performance is a concern.


After trying many different approaches, I've decided on this one which I think is a good
balance of being able to verify the data with the intermediate result, shows the
intention of what the logic is, and performs adequately:

WITH sibling AS (
SELECT, sibling.dob,,
row_number() OVER (PARTITION BY
ORDER BY sibling.dob DESC) AS age_closeness
FROM person
JOIN person AS sibling ON sibling.dob < person.dob
UPDATE person
SET younger_sibling_name =
,younger_sibling_dob = sibling.dob
FROM sibling
AND sibling.age_closeness = 1;

SELECT * FROM person ORDER BY dob;

Answer Source

Correlated subqueries are infamous for abysmal performance. Doesn't matter much for small tables, matters a lot for big tables. Use one of these instead, preferably the second:

Query 1

WITH cte AS (
   SELECT *, dense_rank() OVER (ORDER BY dob) AS drk
   FROM   person
UPDATE person p
SET    younger_sibling_name =
      ,younger_sibling_dob  = y.dob
FROM   cte x
JOIN   (SELECT DISTINCT ON (drk) * FROM cte) y ON y.drk = x.drk + 1

-> SQLfiddle (with extended test case)

  • In the CTE cte use the window function dense_rank() to get a rank without gaps according to the dop for every person.

  • Join cte to itself, but remove duplicates on dob from the second instance. Thereby everybody gets exactly one UPDATE. If more than one person share the same dop, the same one is selected as younger sibling for all persons on the next dob. I do this with:

    (SELECT DISTINCT ON (rnk) * FROM cte)

    Add ORDER BY rnk, ... if you want to pick a particular person for every dob.

  • If no younger person exists, no UPDATE happens and the columns stay NULL.

  • Indices on dob and pid make this fast.

Query 2

WITH cte AS (
   SELECT dob, min(name) AS name
         ,row_number() OVER (ORDER BY dob) rn
   FROM   person p
   GROUP  BY dob
UPDATE person p
SET    younger_sibling_name =
      ,younger_sibling_dob  = y.dob
FROM   cte x
JOIN   cte y ON y.rn = x.rn + 1
WHERE  x.dob = p.dob;

-> SQLfiddle

  • This works, because aggregate functions are applied before window functions. And it should be very fast, since both operations agree on the sort order.

  • Obviates the need for a later DISTINCT like in query 1.

  • Result is the same as query 1, exactly.
    Again, you can add more columns to ORDER BY to pick a particular person for every dob.

  • Only needs an index on dob to be fast.

Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download