Nimbocrux Nimbocrux - 2 months ago 13
SQL Question

Return the first n records per group Oracle SQL

My problem



I want to return the top n rows per group ordered by a date in Oracle 10g

My table



EMPLOYEE|START_DATE|DEPARTMENT
Amy |01-02-1901|Sales
Edwina |01-02-1902|Mergers
Tawnee |01-02-1904|Legal
Trudy |01-02-1998|Sales
Tanner |01-02-1967|Sales
Kelly |01-02-1954|Mergers
Jenny |01-02-1991|Sales
Jacinta |01-02-1924|Legal
Suzanne |01-02-1976|Legal
Jacqui |01-02-1989|Legal
Jill |01-02-1989|Mergers
Kate |01-02-1998|Mergers
Jane |01-02-1900|Sales
Louise |01-02-1912|Mergers
Kim |01-02-1976|Sales
Cara |01-02-1955|Sales
Kirsten |01-02-1933|Legal
Sarah |01-02-1998|Legal


Desired outcome



EMPLOYEE|START_DATE|DEPARTMENT
Jane |01-02-1900|Sales
Amy |01-02-1901|Sales
Tawnee |01-02-1904|Legal
Jacinta |01-02-1924|Legal
Sarah |01-02-1998|Legal
Edwina |01-02-1902|Mergers
Louise |01-02-1912|Mergers


What I've tried



(select * from
employees where
DEPARTMENT = 'Sales' and
rownum <3;)
UNION
(select * from
employees where
DEPARTMENT = 'Legal' and
rownum <3;)
UNION
(select * from
employees where
DEPARTMENT = 'Mergers' and
rownum <3;)


REALLY ugly query

I'm thinking if there was a way you could you an

OVER (PARTITION BY DEPARTMENT)


but from what I read, this needs to be preceded by an analytic function (count, sum whatever). Is there a more elegant, inexpensive solution?

Answer

Consider this non-Windows function approach using a count correlated aggregate query. The idea is to run a department rank subquery and then use that in a derived table that filters outer query by this department rank. Please note your desired results do not return by ordered START_DATE but simply query's row number.

SELECT main.EMPLOYEE, t.START_DATE, t.DEPARTMENT
FROM 
   (SELECT t.EMPLOYEE, t.START_DATE, t.DEPARTMENT, 
          (SELECT Count(*) FROM Employees sub 
           WHERE sub.START_DATE <= t.START_DATE 
           AND sub.Department = t.Department) AS DeptRank
FROM Employees t) main
WHERE main.DeptRank <= 3
ORDER BY main.DEPARTMENT, main.START_DATE;

-- EMPLOYEE START_DATE  DEPARTMENT
-- Tawnee   1/2/1904    Legal
-- Jacinta  1/2/1924    Legal
-- Kirsten  1/2/1933    Legal
-- Edwina   1/2/1902    Mergers
-- Louise   1/2/1912    Mergers
-- Kelly    1/2/1954    Mergers
-- Jane     1/2/1900    Sales
-- Amy      1/2/1901    Sales
-- Cara     1/2/1955    Sales

For the Windows function counterpart:

SELECT main.EMPLOYEE, t.START_DATE, t.DEPARTMENT
FROM 
   (SELECT t.EMPLOYEE, t.START_DATE, t.DEPARTMENT, 
           RANK() OVER (PARTITION BY Department
           ORDER BY START_DATE) AS DeptRank
FROM Employees t) main
WHERE main.DeptRank <= 3
ORDER BY main.DEPARTMENT, main.START_DATE;

And as @Matt comments, you may want to handle ties (i.e., employees who started on same day). Both above solutions will output all such employees depending on rank filter. To take one of the ties in correlated subquery, use Employee name as tiebreaker (or better yet a unique ID if available):

SELECT main.EMPLOYEE, t.START_DATE, t.DEPARTMENT
FROM 
   (SELECT t.EMPLOYEE, t.START_DATE, t.DEPARTMENT, 
           (SELECT Count(*) FROM Employees sub 
            WHERE sub.Department =  t.Department 
             AND (sub.START_DATE <= t.START_DATE
                  OR sub.START_DATE = t.START_DATE 
                  AND sub.EMPLOYEE < t.EMPLOYEE) AS DeptRank
FROM Employees t) main
WHERE main.DeptRank <= 3
ORDER BY main.DEPARTMENT, main.START_DATE;

And for window-function query use ROW_NUMBER() in place of RANK().

Comments