Ben Ben - 3 months ago 10
SQL Question

How to add a running count to rows in a 'streak' of consecutive days

Thanks to Mike for the suggestion to add the create/insert statements.

create table test (
pid integer not null,
date date not null,
primary key (pid, date)
);

insert into test values
(1,'2014-10-1')
, (1,'2014-10-2')
, (1,'2014-10-3')
, (1,'2014-10-5')
, (1,'2014-10-7')
, (2,'2014-10-1')
, (2,'2014-10-2')
, (2,'2014-10-3')
, (2,'2014-10-5')
, (2,'2014-10-7');


I want to add a new column that is 'days in current streak'
so the result would look like:

pid | date | in_streak
-------|-----------|----------
1 | 2014-10-1 | 1
1 | 2014-10-2 | 2
1 | 2014-10-3 | 3
1 | 2014-10-5 | 1
1 | 2014-10-7 | 1
2 | 2014-10-2 | 1
2 | 2014-10-3 | 2
2 | 2014-10-4 | 3
2 | 2014-10-6 | 1


I've been trying to use the answers from



but I can't work out how to use the
dense_rank()
trick with other window functions to get the right result.

Answer

Building on this table (not using the SQL keyword "date" as column name.):

CREATE TABLE tbl(
  pid int
, the_date date
, PRIMARY KEY (pid, the_date)
);

Query:

SELECT pid, the_date
    , row_number() OVER (PARTITION BY pid, grp ORDER BY the_date) AS in_streak
FROM  (
   SELECT *, the_date - '2000-01-01'::date - row_number() 
           OVER (PARTITION BY pid ORDER BY the_date) AS grp
   FROM   tbl
) sub
ORDER  BY pid, the_date;

Subtracting a date from another date yields an integer. Since you are looking for consecutive days, every next row would be greater by one. If we subtract row_number() from that, the whole streak ends up in the same group (grp) per pid. Then it's simple to deal out number per group.

grp is calculated with two subtractions, which should be fastest. An equally fast alternative could be:

the_date - row_number() OVER (PARTITION BY pid ORDER BY the_date) * interval '1d' AS grp

One multiplication, one subtraction. String concatenation and casting is more expensive. Test with EXPLAIN ANALYZE.

Don't forget to partition by pid additionally in both steps, or you'll inadvertently mix groups that should be separated.

Using a subquery, since that is typically faster than a CTE. There is nothing here that a plain subquery couldn't do.

And since you mentioned it: dense_rank() is obviously not necessary here. Basic row_number() does the job.

Comments