Nath Papadacis Nath Papadacis - 6 months ago 11
SQL Question

SQL Group By with Sum for each date and using max date

I've seen a lot of similar questions but nothing that quite nails my particular problem.

I have a table storing multiple positions for each account. Changes are stored as deltas. So take for example on day 1 the following...

AC_ID | POS_ID | ASAT | VAL
1 | 1 | 2016-01-01 | 100
1 | 2 | 2016-01-01 | 200


The total value for
AC_ID
1 is 300 on 01/01/2016.The next day it may update to be...

AC_ID | POS_ID | ASAT | VAL
1 | 1 | 2016-01-01 | 100
1 | 2 | 2016-01-01 | 200
1 | 2 | 2016-01-02 | 250


Now the total value for
AC_ID
1 is 350. This is because the new record for
POS_ID
2 overrides the previous, but the value for
POS_ID
1 has not changed. In order to remove
POS_ID
1 the table would change to something like...

AC_ID | POS_ID | ASAT | VAL
1 | 1 | 2016-01-01 | 100
1 | 2 | 2016-01-01 | 200
1 | 2 | 2016-01-02 | 250
1 | 1 | 2016-01-03 | 0


Now the value changes to 250 on day 3.

I can calculate the value at any given date with a subquery like so

SELECT SUM(VAL) FROM POSITION P1
WHERE P1.ASAT =
(SELECT MAX(P2.ASAT) FROM POSITION P2
WHERE P1.AC_ID = P2.AC_ID
AND P1.POS_ID = P2.POS_ID
AND P2.DATE <= [CHOSEN DATE])


What I'd like to do now is write a single query that will give me the total value for every
AC_ID
for every
ASAT
. If not for the delta storage mechanism I could easily achieve this using

SELECT AC_ID, ASAT, SUM(VAL) FROM POSITION
GROUP BY AC_ID, ASAT
ORDER BY ASAT DESC


What I'm looking for is something that will achieve the above but take into account the join back on the table. If I use the above then I'll only get totals for anything that changed on the
ASAT
date and not all of the existing values that haven't changed.

In the above example that should equate to a resultset of

AC_ID | ASAT | SUM(VAL)
1 | 2016-01-01 | 300
1 | 2016-01-02 | 350
1 | 2016-01-03 | 250


Here's another example of data vs output

AC_ID | POS_ID | ASAT | VAL
1 | 1 | 2016-01-01 | 100
1 | 2 | 2016-01-01 | 200
1 | 2 | 2016-01-02 | 250
1 | 1 | 2016-01-03 | 0
2 | 1 | 2016-01-02 | 500
3 | 7 | 2016-01-02 | 1000
3 | 7 | 2016-01-03 | 1000
3 | 12 | 2016-01-03 | 5000
2 | 1 | 2016-01-04 | 750


Result

AC_ID | ASAT | SUM(VAL)
1 | 2016-01-01 | 300
1 | 2016-01-02 | 350
1 | 2016-01-03 | 250
2 | 2016-01-02 | 500
2 | 2016-01-04 | 750
3 | 2016-01-02 | 1000
3 | 2016-01-03 | 6000


I CHANGED HOW THIS WORKS

Although the answers below worked the performance of them was shockingly bad (through no fault of the authors!) In order to get this to something acceptable (I need sub-second return) I refactored the table to include an
end_date
column. This column gets updated on each insert to set the life span of that row. If a row doesn't have a superseding entry then the end date is set to 9999-12-31. My example above becomes...

AC_ID | POS_ID | ASAT | END_DATE | VAL
1 | 1 | 2016-01-01 | 2016-01-03 | 100
1 | 2 | 2016-01-01 | 2016-01-02 | 200
1 | 2 | 2016-01-02 | 9999-12-31 | 250
1 | 1 | 2016-01-03 | 9999-12-31 | 0
2 | 1 | 2016-01-02 | 2016-01-04 | 500
3 | 7 | 2016-01-02 | 2016-01-03 | 1000
3 | 7 | 2016-01-03 | 9999-12-31 | 1000
3 | 12 | 2016-01-03 | 9999-12-31 | 5000
2 | 1 | 2016-01-04 | 9999-12-31 | 750


I can then remove the second join from accepted answer and add an extra clause to the inner join.

SELECT
p1.AC_ID,
p1.ASAT,
SUM(p2.VAL) as totalValue
FROM
(SELECT DISTINCT AC_ID, ASAT FROM position) p1
INNER JOIN position p2 ON
p2.AC_ID = p1.AC_ID AND
p2.ASAT <= p1.ASAT AND
p2.END_DATE > p1.END_DATE
GROUP BY
p1.AC_ID,
p1.ASAT;

Answer

This should give you what you need:

SELECT
    P1.ac_id,
    P1.asat,
    SUM(P2.val) AS total_value
FROM
    (SELECT DISTINCT P.ac_id, P.asat FROM dbo.Position P) P1
INNER JOIN dbo.Position P2 ON
    P2.ac_id = P1.ac_id AND
    P2.asat <= P1.asat
LEFT OUTER JOIN dbo.Position P3 ON
    P3.ac_id = P1.ac_id AND
    P3.pos_id = P2.pos_id AND
    P3.asat > P2.asat AND
    P3.asat <= P1.asat
WHERE
    P3.ac_id IS NULL
GROUP BY
    P1.ac_id,
    P1.asat

The query gets you all of your ac_id/asat combinations, then grabs any rows that might fall into those that need to be totaled, and finally uses the LEFT OUTER JOIN and check for NULL to eliminate any rows that aren't the latest for that particular pos_id.