nikolifish nikolifish - 2 months ago 15
SQL Question

row_number with partition value changing

I have this piece of sql code that finds the row_number based on some values in a table __working that joins into a lookup table __Eval

;WITH r
AS (
select

w.uid,
t.TypeId,
--weight
ROW_NUMBER () OVER (PARTITION BY w.uid ORDER BY DIFFERENCE(t.val1, w.val1) + DIFFERENCE(t.val2, w.val2) + DIFFERENCE(t.val3, w.val3) + DIFFERENCE(t.val4, w.val4) DESC) as Score
,w.account

from __Working w
join __eval t on w.val1 like t.val1 and IsNull(w.val4, '') like t.val4 and IsNull(w.val2, '') like t.val2 and IsNull(w.val3, '') like t.val3

)

select * from r where r.account = 1 and score = 1


this returns a typeId = 1

however if I write it like this

;WITH r
AS (
select

w.uid,
t.TypeId,
--weight
ROW_NUMBER () OVER (PARTITION BY w.uid ORDER BY DIFFERENCE(t.val1, w.val1) + DIFFERENCE(t.val2, w.val2) + DIFFERENCE(t.val3, w.val3) + DIFFERENCE(t.val4, w.val4) DESC) as Score
,w.account

from __Working w
join __eval t on w.val1 like t.val1 and IsNull(w.val4, '') like t.val4 and IsNull(w.val2, '') like t.val2 and IsNull(w.val3, '') like t.val3
where r.account = 1
)

select * from r where r.account = 1 and score = 1


it returns TypeId = 2. I would expect that if i had multiple UIDs across different accounts in __working, but I don't. What am i missing here?

Answer

Oh, this is a weirdness of unstable sorts. Your row_number() expression is:

 ROW_NUMBER() OVER (PARTITION BY w.uid
                    ORDER BY  DIFFERENCE(t.val1, w.val1) +
                              DIFFERENCE(t.val2, w.val2) +
                              DIFFERENCE(t.val3, w.val3) +
                              DIFFERENCE(t.val4, w.val4)  DESC
                   ) as Score

The problem is that multiple rows have the same value for the ORDER BY key. Different invocations arbitrary choose which of these multiple rows is first, second, and so on.

The canonical solution is to include some sort of unique key so the sort is stable:

 ROW_NUMBER() OVER (PARTITION BY w.uid
                    ORDER BY  (DIFFERENCE(t.val1, w.val1) +
                               DIFFERENCE(t.val2, w.val2) +
                               DIFFERENCE(t.val3, w.val3) +
                               DIFFERENCE(t.val4, w.val4)
                              ) DESC,
                              ??  -- perhaps typeId
                   ) as Score

However, I might suggest a more arduous solution. Accept the fact that ties might exist, and use rank() and dense_rank() to identify them. Then, figure out explicitly what to do in the case of a tie -- perhaps all are equally interesting to you or perhaps you have some other method of breaking ties.

Comments