Karl Karl - 5 months ago 8
SQL Question

Can SQL be used to return a distributed sample set?

I need a sample dataset, basically to reduce the number of rows so that the sample can be loaded into one Excel sheet.

I've looked at the analytic functions and this Ask Tom post, but I'm not able to get the results I need, basically because I don't see my needs being able to utilize an analytical grouping.

Here's a representation of the data. Each row is unique.

Cat_No
is the primary key.


CAT_NO |PROD_CODE |DESCRIPTION |COLOR |
SH1 |SH |Shirt |Blue |
SH2 |SH |Shirt |Red |
BT1 |BT |Belt |Black |
BT4 |BT |Belt |Brown |
WT1 |WT |Watch |Silver|
WT6 |WT |Watch |Gold |


Is it possible to write SQL query that will return one row with all columns for each
PROD_CODE
? If, what is it?

Hopefully there is an ANSI SQL solution as I have to use both PSQL and TSQL. Also, performance is not much of a consideration even though tables may have millions of rows. This is for one time execution to have sample data for documentation.

For example, the result could look like this:


CAT_NO |PROD_CODE |DESCRIPTION |COLOR |
SH1 |SH |Shirt |Blue |
BT4 |BT |Belt |Brown |
WT6 |WT |Watch |Gold |


One row for each
PROD_CODE
. Any row is acceptable.

jpw jpw
Answer

If it doesn't matter what data you get back per prod_code you could use row_number() over (partition by prod_code order by cat_no) as row_num and filter the resulting set on row_num = 1.

Something like this:

SELECT CAT_NO, PROD_CODE, DESCRIPTION, COLOR 
FROM (
  SELECT *, ROW_NUMBER() OVER (PARTITION BY PROD_CODE ORDER BY CAT_NO) AS row_num
  FROM TABLE
) t
WHERE t.row_num = 1
Comments