idlackage idlackage - 5 months ago 8
SQL Question

How to limit the number of rows returned for each group

I have a table that stores one record for each item in a filesystem, where an item can be both a folder an an actual file.

ID | Name | Date | Parent ID
-----------------------------------
0 | someFolder | xxx | NULL
1 | a.txt | yyy | 0
2 | b.txt | zzz | 0


So if I have a folder structure that goes like this:

mainFolder (ID = 0)
folder1
a.txt
b.txt
folder2
c.txt
d.txt


And I want to find all leaf nodes, 'grouped' by what root folder they're in, I use this query:

select id, name, date, connect_by_root name as "Group" from myTable
where connect_by_isleaf = 1
start with parentid = 0
connect by prior id = parentid


Which gets me something along the lines of this output:

ID | Name | Date | Group
---------------------------
3 | a.txt | xxx | folder1
4 | b.txt | yyy | folder1
8 | c.txt | zzz | folder2
9 | d.txt | xyz | folder2


What I want to do is limit the number of rows returned for each 'group'. For example, even if both folders had more than 2 items, I would want only the first two (in terms of most recent date) from each. How would I do this?

Answer

So I'm attempting to use the row_number() analytic to assign a row number to for each file in a group. starting at 1 going to X and then use a where clause to limit the row_number to just the 2 files desired... Since the row_number has to materialize before we can apply a where clause to it, I need to use a subselect or CTE.

Not sure how well a CTE and connect by prior along with a row_number will play together... May have to use 2 CTE's

I doubt I have the syntax perfect without testing; but this convey's a general concept.

1st attempt:

With CTE AS (
select id, name, date, connect_by_root name as "Group",
ROW_NUMBER() over (partition by connect_by_root name order by ID ) RN
from myTable
where connect_by_isleaf = 1
start with parentid = 0
connect by prior id = parentid)
Select * from cte where RN <= 2

Second attempt:

With CTE AS (
select id, name, date, connect_by_root name as "Group" from myTable
where connect_by_isleaf = 1
start with parentid = 0
connect by prior id = parentid),

CTE2 as (Select A.*, 
        Row_number() over (partition by Group order by ID) RN from CTE A)
Select * from cte2 where RN <= 2
Comments