Evgeni Evgeni - 2 months ago 10
SQL Question

SQL Random sample table with duplicates

Assuming I have the following table:

unitid | transtatus | currency
---------------------------------------
1024393230 | not_started | GBp - Pence
1024397398 | in_progress | GBp - Pence
1024397398 | not_started | USd - Cent
1024397408 | not_started | GBp - Pence
1024397408 | not_started | EUR
1024401371 | not_started | GBp - Pence
1024403375 | in_progress | GBp - Pence


I want to select random rows for QC I can do it by

select top 3
tbble.unitid,tbble.transtatus, tbble.currency
from tbble
order by newid()


However since some rows share the same unitid (and if this is the case) I want to pull all the rows associated with this unitid

So the query would return: (in case random rows only has one row for this unitid)

unitid | transtatus | currency
---------------------------------------
1024393230 | not_started | GBp - Pence
1024401371 | not_started | GBp - Pence
1024403375 | in_progress | GBp - Pence


or : (in case of two rows associated with this unitid)

1024397398 | in_progress | GBp - Pence
1024397398 | not_started | USd - Cent
1024401371 | not_started | GBp - Pence
1024403375 | in_progress | GBp - Pence


Not really sure how can I achieve this. Maybe counting the number of unitid appearances first and then if count is over 1 then adding these to initial random sample?

Answer

I think this accomplishes what you want, which is three random unit ids and all their rows:

select t.*
from tbble t join
     (select top 3 t.unitid
      from (select distinct t.unitid from tbble t) t
      order by newid()
     ) tt
     on t.unitid = tt.unitid
Comments