MidwestProgrammingDJ MidwestProgrammingDJ - 27 days ago 10
SQL Question

Combining consecutive rows based on a "type" column

I am looking for ideas and solutions T-SQL to combine consecutive records like the example below.

The source database I'm working with will have audit records, and a column called "Audit_Type" which could contain many different things such as "Saved Form" "Exported Record", "Imported Record", or "Viewed Record" etc. This database ends up with a bunch of extraneous records for the "Saved Form" type, because the app that makes this database autosaves forms as a user edits it fairly regularly. So often there will be a bunch of "Saved Form" records in a row.
To picture it:

ID Audit Type DateTime
1 "Viewed Record" 2017-01-03 11:16:33.000
2 "Saved Form" 2017-01-04 09:51:36.837
3 "Saved Form" 2017-01-04 09:52:40.837
4 "Saved Form" 2017-01-04 09:52:44.837
5 "Saved Form" 2017-01-04 09:52:49.837
6 "Saved Form" 2017-01-04 09:52:54.837
7 "Saved Form" 2017-01-04 09:54:59.837
8 "Exported Record" 2017-01-04 09:55:59.837


Question 1. I would like to combine these consecutive "Saved Form" records into one record, by grabbing consecutive "Saved Form" records and combining them into one record that uses the timestamp of the very last "Saved Form" before loading it into my target database. Something like this

ID Audit Type DateTime
1 "Viewed Record" 2017-01-03 11:16:33.000
7 "Saved Form" 2017-01-04 09:54:59.837
8 "Exported Record" 2017-01-04 09:55:59.837


I have tried a few approaches so far but I would like to hear ideas.

Question 2. Based on my research and reading other similar questions on SO, I saw this could be similar to an Islands and Gaps problem, is this an accurate characterization of this problem?

EDIT
This is for SQL Server 2012.
I'm extracting from a database that I have no control over how it logs the information.

Also to clarify, there are other columns from this log table, I ommitted for brevity, so in the example above, we can assume all of the "Saved Form" records are from the same session and same user

Answer Source

Gordon beat me to the answer. Yes, this does fit into an Islands and Gaps methodology. I think a LEAD() would suit this issue quite well. But I also tried a second query with a ROW_NUMBER(), which produced a slightly shorter execution plan. Not sure which would be more efficient at scale. That would require a bit more testing.

NOTE 1: I also added an assumed SessionID and UserID to my query. The additional columns may change your final results.

NOTE 2: SQL Fiddle reports that the ROW_NUMBER version runs much faster with fewer "same time" entries, but LEAD version is faster with many "same time" entries.

SQL Fiddle

MS SQL Server 2014 Schema Setup:

CREATE TABLE foo ( ID int IDENTITY, sessionID int, userid int, AuditType varchar(50), [DateTime] datetime ) ;
INSERT INTO foo ( sessionID, userID, AuditType, [DateTime] )
VALUES 
      (1,1,'Viewed Record','2017-01-03 11:16:33.000')
    , (1,1,'Saved Form','2017-01-04 09:51:36.837')
    , (2,2,'Viewed Record','2017-01-04 09:52:00.000')
    , (1,1,'Saved Form','2017-01-04 09:52:40.837')
    , (1,1,'Saved Form','2017-01-04 09:52:44.837')
    , (2,2,'Saved Form','2017-01-04 09:52:45.000')
    , (2,2,'Saved Form','2017-01-04 09:52:46.000')
    , (2,2,'Saved Form','2017-01-04 09:52:47.000')
    , (2,2,'Saved Form','2017-01-04 09:52:48.000')
    , (1,1,'Saved Form','2017-01-04 09:52:49.837')
    , (1,1,'Saved Form','2017-01-04 09:52:54.837')
    , (2,2,'Exported Record','2017-01-04 09:53:00.000')
    , (1,1,'Saved Form','2017-01-04 09:54:59.837')
    , (1,1,'Exported Record','2017-01-04 09:55:59.837')
    , (2,1,'Viewed Record','2017-01-04 10:00:00.000')
    , (2,1,'Saved Form','2017-01-04 10:02:00.000')
    , (2,1,'Saved Form','2017-01-04 10:04:00.000')
    , (2,1,'Saved Form','2017-01-04 10:06:00.000')
    , (2,1,'Exported Record','2017-01-04 10:10:00.000')
;

Query 1 (LEAD()):

SELECT s1.sessionID
  , s1.userID
  , s1.AuditType
  , s1.[DateTime]
FROM (
    SELECT foo.*
      , LEAD(foo.AuditType) OVER ( ORDER BY foo.userID, foo.sessionID, foo.[DateTime] ) AS next_type
    FROM foo
  ) s1
WHERE s1.next_type IS NULL OR s1.next_type <> s1.AuditType
ORDER BY s1.sessionID, s1.userID, s1.[DateTime]

Results:

| sessionID | userID |       AuditType |                 DateTime |
|-----------|--------|-----------------|--------------------------|
|         1 |      1 |   Viewed Record |     2017-01-03T11:16:33Z |
|         1 |      1 |      Saved Form | 2017-01-04T09:54:59.837Z |
|         1 |      1 | Exported Record | 2017-01-04T09:55:59.837Z |
|         2 |      1 |   Viewed Record |     2017-01-04T10:00:00Z |
|         2 |      1 |      Saved Form |     2017-01-04T10:06:00Z |
|         2 |      1 | Exported Record |     2017-01-04T10:10:00Z |
|         2 |      2 |   Viewed Record |     2017-01-04T09:52:00Z |
|         2 |      2 |      Saved Form |     2017-01-04T09:52:48Z |
|         2 |      2 | Exported Record |     2017-01-04T09:53:00Z |

Query 2 (ROW_NUMBER()):

SELECT s1.*
FROM (
    SELECT foo.*
      , ROW_NUMBER() OVER ( PARTITION BY foo.userID, foo.sessionID, foo.AuditType ORDER BY foo.userID, foo.sessionID, foo.[DateTime] DESC ) AS rn
    FROM foo
  ) s1
WHERE rn = 1
ORDER BY s1.sessionID, s1.userID, s1.[DateTime]

Results:

| ID | sessionID | userid |       AuditType |                 DateTime | rn |
|----|-----------|--------|-----------------|--------------------------|----|
|  1 |         1 |      1 |   Viewed Record |     2017-01-03T11:16:33Z |  1 |
| 13 |         1 |      1 |      Saved Form | 2017-01-04T09:54:59.837Z |  1 |
| 14 |         1 |      1 | Exported Record | 2017-01-04T09:55:59.837Z |  1 |
| 15 |         2 |      1 |   Viewed Record |     2017-01-04T10:00:00Z |  1 |
| 18 |         2 |      1 |      Saved Form |     2017-01-04T10:06:00Z |  1 |
| 19 |         2 |      1 | Exported Record |     2017-01-04T10:10:00Z |  1 |
|  3 |         2 |      2 |   Viewed Record |     2017-01-04T09:52:00Z |  1 |
|  9 |         2 |      2 |      Saved Form |     2017-01-04T09:52:48Z |  1 |
| 12 |         2 |      2 | Exported Record |     2017-01-04T09:53:00Z |  1 |

They should both show:

  1,1,'Viewed Record','2017-01-03 11:16:33.000'
  1,1,'Saved Form','2017-01-04 09:54:59.837'
  1,1,'Exported Record','2017-01-04 09:55:59.837'

  2,1,'Viewed Record','2017-01-04 10:00:00.000'
  2,1,'Saved Form','2017-01-04 10:06:00.000'
  2,1,'Exported Record','2017-01-04 10:10:00.000'

  2,2,'Viewed Record','2017-01-04 09:52:00.000'
  2,2,'Saved Form','2017-01-04 09:52:48.000'
  2,2,'Exported Record','2017-01-04 09:53:00.000'