user113531 user113531 - 11 months ago 59
SQL Question

SQL select groups with minimum sizes

I didn't manage to find an answer for this, so

basically how do you perform a SQL query on a dataset to first group the rows based on a few columns and then select/filter only the groups with more rows than a specified size.

Heres is an example of what I am trying to achieve with a pandas dataframe:

df.groupby([cols_to_group]).filter(lambda x: len(x) > minimum_group_size)

How can you perform this with SQL query?


I think a solution can be this:

    SELECT *
        , COUNT(*) OVER (PARTITION BY cols_to_group) as cnt  -- cnt is length of each group
    FROM yourTable) t
WHERE t.cnt > minimum_group_size;