Richard Walker Richard Walker - 4 months ago 11
SQL Question

Getting distinct highest revision number

Background



I'm struggling to write a query to select a record with the highest revision number for each id across 3 tables. the worksite id is shared across all 3 tables so a query for worksitedraft.id = 103 and worksitesubmission.id = 103 would each return the same worksite in a different state.

tables



All 3 tables worksite, worksiteDraft, worksiteSubmission have the columns:

id, name, reference, majorrevisionnumber, minorrevisionnumber

Essentially I need to union all three tables together then distinct the results but make sure that for each result it is the record with the highest revision number from across the 3 tables.

Current solution



So far I have the following (working) solution but it's pretty nasty and feels like there must be a better way.

Create a view of all worksites across the 3 tables unioned together:

CREATE VIEW allworksites AS
SELECT * FROM
(
SELECT id, name, reference, 'worksiteDraft' as type, CONCAT(majorrevisionnumber, '.', minorrevisionnumber) as revisionnumber
FROM worksitedraft
UNION
SELECT id, name, reference, 'worksiteSubmission' as type, CONCAT(majorrevisionnumber, '.', minorrevisionnumber) as revisionnumber
FROM worksitesubmission
UNION
SELECT id, name, reference, 'worksite' as type, CONCAT(majorrevisionnumber, '.', minorrevisionnumber) as revisionnumber
FROM worksite
) as allworksites;


Then use a nested select to calculate the type via ordering by major/minor revision number and limiting to 1

SELECT * FROM allworksites
WHERE id = 106
AND type =

(
SELECT type FROM
(
SELECT
id,
type,
revisionnumber
FROM
(
SELECT 'worksiteDraft' as type, CONCAT(majorrevisionnumber, '.', minorrevisionnumber) as revisionnumber
FROM worksitedraft
WHERE id = allworksites.id
UNION ALL
SELECT 'worksiteSubmission' as type, CONCAT(majorrevisionnumber, '.', minorrevisionnumber) as revisionnumber
FROM worksitesubmission
WHERE id = allworksites.id
UNION ALL
SELECT 'worksite' as type, CONCAT(majorrevisionnumber, '.', minorrevisionnumber) as revisionnumber
FROM worksite
WHERE id = allworksites.id
) as latestrevision
ORDER BY revisionnumber DESC
limit 1
) as latestrevisiontype
)


Question



Is there another (perhaps a totally different) better approach to querying this data?

EDIT. Added sample data and desired result as requested

Sample data



worksitedraft



| "id"| "name" | "reference"| "majorrevisionnumber" | "minorrevisionnumber"
| 101 | "Worksite One" | "ref-1" | 1 | 0
| 102 | "Worksite Two" | "ref-2" | 1 | 0
| 103 | "Worksite Three"| "ref-3" | 1 | 0
| 104 | "Worksite Four" | "ref-4" | 2 | 0
| 105 | "Worksite Five" | "ref-5" | 2 | 0
| 106 | "Worksite Six" | "ref-6" | 3 | 0


worksitesubmission



| "id"| "name" | "reference"| "majorrevisionnumber" | "minorrevisionnumber"
| 101 | "Worksite One" | "ref-1" | 1 | 1
| 102 | "Worksite Two" | "ref-2" | 1 | 2
| 103 | "Worksite Three"| "ref-3" | 1 | 2
| 104 | "Worksite Four" | "ref-4" | 1 | 2
| 105 | "Worksite Five" | "ref-5" | 2 | 1
| 106 | "Worksite Six" | "ref-6" | 2 | 2


worksite



| "id"| "name" | "reference"| "majorrevisionnumber" | "minorrevisionnumber"
| 101 | "Worksite One" | "ref-1" | 1 | 2
| 102 | "Worksite Two" | "ref-2" | 1 | 3
| 103 | "Worksite Three"| "ref-3" | 1 | 3
| 104 | "Worksite Four" | "ref-4" | 1 | 2
| 105 | "Worksite Five" | "ref-5" | 2 | 2
| 106 | "Worksite Six" | "ref-6" | 2 | 3


Desired result



| "id"| "name" | "reference"| "majorrevisionnumber" | "minorrevisionnumber"
| 101 | "Worksite One" | "ref-1" | 1 | 2
| 102 | "Worksite Two" | "ref-2" | 1 | 3
| 103 | "Worksite Three"| "ref-3" | 1 | 3
| 104 | "Worksite Four" | "ref-4" | 2 | 0
| 105 | "Worksite Five" | "ref-5" | 2 | 2
| 106 | "Worksite Six" | "ref-6" | 3 | 0


Note: I'm using postgresql 9.4

EDIT: Final solution after suggestions from @Gordon_Linoff and @a_horse_with_no_name

SELECT DISTINCT ON (id) *
FROM (SELECT id, name, reference, 'worksiteDraft' as type, majorrevisionnumber, minorrevisionnumber
FROM worksitedraft
UNION ALL
SELECT id, name, reference, 'worksiteSubmission' as type, majorrevisionnumber, minorrevisionnumber
FROM worksitesubmission
UNION ALL
SELECT id, name, reference, 'worksite' as type, majorrevisionnumber, minorrevisionnumber
FROM worksite
) allworksites
ORDER BY id, ARRAY[majorrevisionnumber, minorrevisionnumber]::int[] DESC;

Answer

You can do this with a single query. I think the simplest method is DISTINCT ON:

SELECT DISTINCT ON (id) * 
FROM (SELECT id, name, reference, 'worksiteDraft' as type,
             CONCAT(majorrevisionnumber, '.', minorrevisionnumber) as revisionnumber
      FROM worksitedraft
      UNION ALL
      SELECT id, name, reference, 'worksiteSubmission' as type,      
             CONCAT(majorrevisionnumber, '.', minorrevisionnumber) as revisionnumber
      FROM worksitesubmission
      UNION ALL
      SELECT id, name, reference, 'worksite' as type, 
             CONCAT(majorrevisionnumber, '.', minorrevisionnumber) as revisionnumber
      FROM worksite
     ) allworksites
ORDER BY id, revisionnumber DESC;

Notes:

  • Use UNION ALL instead of UNION. UNION incurs overhead for removing duplicate values, which is unnecessary overhead.
  • Unless your revisions are formatted correctly, concatenating them may not return the latest value. (i.e., 1.9 comes after 1.12 in sort order). However, this is the logic used in the question.
  • DISTINCT ON returns one row per id, the first one encountered, based on the ORDER BY clause.
  • A view is not necessary.