kvietok kvietok - 26 days ago 17
SQL Question

sql query - selection of a correct row from a groups specified by id

id risk origin strength strength_sol
13456 1 1 3 3
13456 134 0 5 NULL
13456 128 0 7 NULL
13456 121 0 5 NULL
13456 122 0 4 NULL
13456 190 0 2 NULL
22367 1 1 5 5
22367 128 0 4 NULL
22367 1 0 2 NULL
22367 36 0 6 NULL
12789 1 1 5 5
12789 1 0 4 NULL
12789 118 1 2 NULL
12789 118 1 5 NULL
12789 1 0 7 NULL
16908 1 0 5 5
16908 36 0 4 NULL
16908 28 1 3 NULL
16908 128 1 5 NULL
16908 1 0 7 NULL
12439 1 0 4 4
12439 134 0 2 NULL
12439 16 0 5 NULL
15678 36 0 4 NULL
15678 28 0 2 NULL
15678 134 0 5 NULL


Hello, can anybody please help me how to figure out my problem? Thank you very much in advance.
Problem and data desription:
I have a big dataset. Above you can see just a small sample in order to describe my problem.
I need to choose exactly one row for each id.
In the dataset above there are all the possible cases that can happen.
The last two columns are not a part of the data set, it is the result that I need to get.Origin is a 0/1 variable.

I need to choose this:
for one id:
1. situation: when risk = 1 and origin = 1 - Im ok, I will take this row, there can be the only row like this for one id in the dataset
2.situation: when for one id there is no case that risk=1 and origin =1, I have to choose the row where risk=1 and origin = 0,
if there are more such a rows, it doesnt matter which one I choose(But I have to choose only ONE of them, not all of them).
3. when there in no risk = 1 in any row for one id (doesnt matter what is the value of the origin), I just simply put NULL as strength_sol

my solution is like this (but it is not correct)
case when risk=1 and origin =1 then strength
when risk=1 and origin = 0 then strength
else NULL end as strength

This solution is not correct, because in the situation number one can happen that there is also a row
with risk=1 and origin=0, but I m not interested in that row(I want NULL for that row).
I cant figure out how can I solve it correctly. Thanks.

vkp vkp
Answer

You can use the row_number function to number the rows so that the first row would be the one with highest priority (in this case risk=1 and origin=1) and the second with the next highest priority (risk=1 and origin=0). All the other rows are numbered arbitrarily and then you can choose the first row from each group.

select id,risk,origin,strength,
case when rnum=1 then strength end strength_sol
from (select t.*,
      row_number() over(partition by id 
                        order by case when risk=1 and origin =1 then 1 
                                      when risk=1 and origin =0 then 2 
                                 else 3 end) rnum
      from t
      ) x 
Comments