Josephine Campbell Josephine Campbell - 5 months ago 12
SQL Question

Efficient way of calculating time spent by machines in different states with a large dataset

I'm having trouble figuring out a good approach to this problem. I have a dataset that looks like this:


MACHINE DATETIME CODE
C54118PC1 2016-04-01 00:00:01.000 10
C54118PC1 2016-04-01 00:01:12.000 4
C54118PC1 2016-04-01 00:01:36.000 10
C54118PC1 2016-04-01 00:01:50.000 4
C54123PC1 2016-04-01 00:00:02.000 0
C54123PC1 2016-04-01 01:00:02.000 0


Ultimately, I want to display this in a Qlikview document as a pie chart that shows how long a machine has spent in each state(code).

In order to do this, I think I need to get the data in this format:


MACHINE DATE CODE DURATION
Machine1 07/06/2016 2 07:30:14
Machine1 07/06/2016 5 02:45:10
Machine2 07/06/2016 0 12:37:05
Machine2 07/06/2016 5 04:04:59
Machine1 08/06/2016 2 02:57:13
Machine1 08/06/2016 4 04:12:05
Machine1 08/06/2016 5 03:07:12
Machine3 08/06/2016 8 02:49:23


This would be 1 row for each code, per day, per machine with the total duration for that day and code.

The other consideration is that this is a large dataset - there are currently at least 2 million rows and this is likely to grow quite quickly as more machines are added and more time passes.

So, basically I need a really efficient way of calculating the time spent in different states(codes), by machine and date. I'm thinking that this would be best done in SQL however I could also do it within Qlikview if this worked out to be more efficient. Can anyone help?

Thank you in advance!







EDIT: Sorry, I forgot to explain how the logging works! We have a bunch of machines that alternate between various states during the day (such as printing, idle, cleaning). Each time they change states (e.g. from printing to idle), they log a timestamp along with their new state.

(They also log timestamps and their current states at regular intervals, even if they haven't actually changed state.)

Duration is calculated as the time difference between two consecutive timestamps.

Answer

Try this one

Declare @Table table (machine varchar(25),DateTime datetime,Code int)
Insert into @Table (Machine,DateTime,Code) values 
('Machine1','2016-04-01 00:00:01.000',10),
('Machine1','2016-04-01 00:01:12.000',4),
('Machine1','2016-04-01 00:01:36.000',10),
('Machine1','2016-04-01 00:01:50.000',4),
('Machine2','2016-04-01 00:00:02.000',0),
('Machine2','2016-04-01 01:00:02.000',0),
('Machine1','2016-04-02 00:00:01.000',10),
('Machine1','2016-04-02 00:01:12.000',4),
('Machine1','2016-04-02 00:01:36.000',10),
('Machine1','2016-04-02 00:01:50.000',4),
('Machine2','2016-04-02 00:00:02.000',0),
('Machine2','2016-04-02 01:00:02.000',0)


IF OBJECT_ID('tempdb.dbo.#Temp1', 'U') IS NOT NULL
    DROP TABLE #Temp1; 

-- Generate Date Range
Select Distinct DateR1=cast(DateTime as Date),DateR2=DateAdd(DD,1,cast(DateTime as Date)) into #Temp1 from @Table
Create Index idx on #Temp1 (DateR1,DateR2)

;with cteBase as (
        Select *
             ,MSDate1 = DateTime
             ,MSDate2 = Lead(DateTime,1,DateAdd(DD,1,cast(DateTime as Date)))  over (Partition by Machine Order by Machine,DateTime) 
         From @Table
      )
Select Machine
      ,Date = DateR1
      ,Code
      ,Duration = CONVERT(varchar, DATEADD(SS, sum(DateDiff(SS,case when MSDate1<=DateR1 then DateR1 else MSDate1 end,case when MSDate2>DateR2 then DateR2 else MSDate2 end)), 0), 114)
 From #Temp1 A
 Join cteBase B on (MSDate1 between DateR1 and DateR2 or MSDate2 between DateR1 and DateR2)
 Group By Machine
      ,DateR1
      ,Code

Returns

Machine         Date        Code    Duration
Machine1        2016-04-01  4       23:58:34:000
Machine1        2016-04-01  10      00:01:25:000
Machine1        2016-04-02  4       23:58:35:000
Machine1        2016-04-02  10      00:01:25:000
Machine2        2016-04-01  0       23:59:58:000
Machine2        2016-04-02  0       00:00:00:000
Comments