djCode djCode - 6 months ago 7
SQL Question

Calculations from table data

I am trying to calculate some counts and average timespans from the data in table:



+---------+--------+------+-----------+----------+
| STATS_I | TASK_I | TYPE | CREATE_TS | END_TS |
+---------+--------+------+-----------+----------+
| 1 | 111 | A | 30-05-16 | 30-05-16 |
+---------+--------+------+-----------+----------+
| 2 | 111 | A | 30-05-16 | |
+---------+--------+------+-----------+----------+
| 3 | 111 | B | 30-05-16 | 30-05-16 |
+---------+--------+------+-----------+----------+
| 4 | 111 | B | 30-05-16 | |
+---------+--------+------+-----------+----------+
| 5 | 111 | C | 30-05-16 | 30-05-16 |
+---------+--------+------+-----------+----------+
| 6 | 222 | D | 30-05-16 | |
+---------+--------+------+-----------+----------+
| 7 | 222 | D | 30-05-16 | 30-05-16 |
+---------+--------+------+-----------+----------+
| 8 | 222 | C | 30-05-16 | 30-05-16 |
+---------+--------+------+-----------+----------+
| 9 | 222 | C | 30-05-16 | 30-05-16 |
+---------+--------+------+-----------+----------+
| 10 | 222 | C | 30-05-16 | |
+---------+--------+------+-----------+----------+
| 11 | 333 | A | 30-05-16 | 30-05-16 |
+---------+--------+------+-----------+----------+
| 12 | 333 | B | 30-05-16 | 30-05-16 |
+---------+--------+------+-----------+----------+
| 13 | 333 | B | 30-05-16 | 31-05-16 |
+---------+--------+------+-----------+----------+
| 14 | 333 | D | 30-05-16 | 30-05-16 |
+---------+--------+------+-----------+----------+
| 15 | 333 | D | 30-05-16 | 31-05-16 |
+---------+--------+------+-----------+----------+
| 16 | 444 | D | 30-05-16 | 30-05-16 |
+---------+--------+------+-----------+----------+
| 17 | 444 | D | 30-05-16 | 31-05-16 |
+---------+--------+------+-----------+----------+
| 18 | 444 | C | 30-05-16 | 30-05-16 |
+---------+--------+------+-----------+----------+
| 19 | 444 | B | 30-05-16 | 30-05-16 |
+---------+--------+------+-----------+----------+
| 20 | 444 | A | 30-05-16 | 30-05-16 |
+---------+--------+------+-----------+----------+


The sample table can be populated with:

CREATE TABLE "STATS" ("STATS_I" NUMBER(10,0), "TASK_I" NUMBER(10,0), "TYPE" VARCHAR2(30), "CREATE_TS" DATE, "END_TS" DATE);
Insert into STATS (STATS_I,TASK_I,TYPE,CREATE_TS,END_TS) values (1,111,'A',to_date('30-05-16','DD-MM-RR'),to_date('30-05-16','DD-MM-RR'));
Insert into STATS (STATS_I,TASK_I,TYPE,CREATE_TS,END_TS) values (2,111,'A',to_date('30-05-16','DD-MM-RR'),null);
Insert into STATS (STATS_I,TASK_I,TYPE,CREATE_TS,END_TS) values (3,111,'B',to_date('30-05-16','DD-MM-RR'),to_date('30-05-16','DD-MM-RR'));
Insert into STATS (STATS_I,TASK_I,TYPE,CREATE_TS,END_TS) values (4,111,'B',to_date('30-05-16','DD-MM-RR'),null);
Insert into STATS (STATS_I,TASK_I,TYPE,CREATE_TS,END_TS) values (5,111,'C',to_date('30-05-16','DD-MM-RR'),to_date('30-05-16','DD-MM-RR'));
Insert into STATS (STATS_I,TASK_I,TYPE,CREATE_TS,END_TS) values (6,222,'D',to_date('30-05-16','DD-MM-RR'),null);
Insert into STATS (STATS_I,TASK_I,TYPE,CREATE_TS,END_TS) values (7,222,'D',to_date('30-05-16','DD-MM-RR'),to_date('30-05-16','DD-MM-RR'));
Insert into STATS (STATS_I,TASK_I,TYPE,CREATE_TS,END_TS) values (8,222,'C',to_date('30-05-16','DD-MM-RR'),to_date('30-05-16','DD-MM-RR'));
Insert into STATS (STATS_I,TASK_I,TYPE,CREATE_TS,END_TS) values (9,222,'C',to_date('30-05-16','DD-MM-RR'),to_date('30-05-16','DD-MM-RR'));
Insert into STATS (STATS_I,TASK_I,TYPE,CREATE_TS,END_TS) values (10,222,'C',to_date('30-05-16','DD-MM-RR'),null);
Insert into STATS (STATS_I,TASK_I,TYPE,CREATE_TS,END_TS) values (11,333,'A',to_date('30-05-16','DD-MM-RR'),to_date('30-05-16','DD-MM-RR'));
Insert into STATS (STATS_I,TASK_I,TYPE,CREATE_TS,END_TS) values (12,333,'B',to_date('30-05-16','DD-MM-RR'),to_date('30-05-16','DD-MM-RR'));
Insert into STATS (STATS_I,TASK_I,TYPE,CREATE_TS,END_TS) values (13,333,'B',to_date('30-05-16','DD-MM-RR'),null);
Insert into STATS (STATS_I,TASK_I,TYPE,CREATE_TS,END_TS) values (14,333,'D',to_date('30-05-16','DD-MM-RR'),to_date('30-05-16','DD-MM-RR'));
Insert into STATS (STATS_I,TASK_I,TYPE,CREATE_TS,END_TS) values (15,333,'D',to_date('30-05-16','DD-MM-RR'),null);
Insert into STATS (STATS_I,TASK_I,TYPE,CREATE_TS,END_TS) values (16,444,'D',to_date('30-05-16','DD-MM-RR'),to_date('30-05-16','DD-MM-RR'));
Insert into STATS (STATS_I,TASK_I,TYPE,CREATE_TS,END_TS) values (17,444,'D',to_date('30-05-16','DD-MM-RR'),null);
Insert into STATS (STATS_I,TASK_I,TYPE,CREATE_TS,END_TS) values (18,444,'C',to_date('30-05-16','DD-MM-RR'),to_date('30-05-16','DD-MM-RR'));
Insert into STATS (STATS_I,TASK_I,TYPE,CREATE_TS,END_TS) values (19,444,'B',to_date('30-05-16','DD-MM-RR'),to_date('30-05-16','DD-MM-RR'));
Insert into STATS (STATS_I,TASK_I,TYPE,CREATE_TS,END_TS) values (20,444,'A',to_date('30-05-16','DD-MM-RR'),to_date('31-05-16','DD-MM-RR'));


-
end_ts
and
create_ts
are kept at 10 minutes difference

The output I'm trying to get is:

+--------+--------------+-------------------+-------------------+-------+
| Task_i | A | B | C | D |
+ +--------------+-------------------+-------------------+-------+
| | Count | Read | Count | Avg. Time | Count | Avg. Time | Count |
| | | | | to Read | | to Read | |
+--------+-------+------+-------+-----------+-------+-----------+-------+
| 111 | 2 | 1 | 2 | 10 min | 1 | 10 min | 0 |
+--------+-------+------+-------+-----------+-------+-----------+-------+
| 222 | 0 | 0 | 0 | 0 min | 3 | 10 min | 2 |
+--------+-------+------+-------+-----------+-------+-----------+-------+
| 333 | 1 | 1 | 2 | 10 min | 0 | 0 min | 2 |
+--------+-------+------+-------+-----------+-------+-----------+-------+
| 444 | 1 | 1 | 1 | 10 min | 1 | 10 min | 2 |
+--------+-------+------+-------+-----------+-------+-----------+-------+


Where for each
task_i
:


  • 'Count' is the count of rows for the type;

  • 'Read' is the count of rows for the type where
    end_ts
    is not null

  • 'Average time to read' is calculated by
    (end_ts - create_ts )/count of type
    ignoring the rows where
    end_ts
    is null



So far I have tried to create four tables, one for each for each type, and then join them on
task_i
:

(((select count(*) as A from stats s where s.type='A' group by s.TYPE UNION ALL
select count(*) as B from stats s where s.type='B' group by s.TYPE ) UNION ALL
select count(*) as C from stats s where s.type='C' group by s.TYPE ) UNION ALL
select count(*) as D from stats s where s.type='D' group by s.TYPE ) ;


but this doesn't produce anything like what I need:

A
----------
4
5
4
7


What am I doing wrong, and how can I generate the required output?

Answer

The counts, which are the part you attempted, can be found with a conditional count - with a case expression inside - rather than with a union:

select task_i,
  count(case when type = 'A' then 1 end) as a_count,
  count(case when type = 'B' then 1 end) as b_count,
  count(case when type = 'C' then 1 end) as c_count,
  count(case when type = 'D' then 1 end) as d_count
from stats
group by task_i
order by task_i;

    TASK_I    A_COUNT    B_COUNT    C_COUNT    D_COUNT
---------- ---------- ---------- ---------- ----------
       111          2          2          1          0
       222          0          0          2          3
       333          1          2          0          2
       444          1          1          1          2

The number of 'reads', where end_ts is not null, can be found by modifying the case expression:

select task_i,
  count(case when type = 'A' then 1 end) as a_count,
  count(case when type = 'A' and end_ts is not null then 1 end) as a_read,
  count(case when type = 'B' then 1 end) as b_count,
  count(case when type = 'B' and end_ts is not null then 1 end) as b_read,
  count(case when type = 'C' then 1 end) as c_count,
  count(case when type = 'C' and end_ts is not null then 1 end) as c_read,
  count(case when type = 'D' then 1 end) as d_count
from stats
group by task_i
order by task_i;

    TASK_I    A_COUNT     A_READ    B_COUNT     B_READ    C_COUNT     C_READ    D_COUNT
---------- ---------- ---------- ---------- ---------- ---------- ---------- ----------
       111          2          1          2          1          1          1          0
       222          0          0          0          0          2          1          3
       333          1          1          2          1          0          0          2
       444          1          1          1          1          1          1          2

The elapse time can be found the same way really, but using avg() instead of count():

select task_i,
  count(case when type = 'A' then 1 end) as a_count,
  count(case when type = 'A' and end_ts is not null then 1 end) as a_read,
  avg(case when type = 'A' and end_ts is not null then end_ts - create_ts end) as a_readtime,
  count(case when type = 'B' then 1 end) as b_count,
  count(case when type = 'B' and end_ts is not null then 1 end) as b_read,
  avg(case when type = 'B' and end_ts is not null then end_ts - create_ts end) as b_readtime,
  count(case when type = 'C' then 1 end) as c_count,
  count(case when type = 'C' and end_ts is not null then 1 end) as c_read,
  avg(case when type = 'C' and end_ts is not null then end_ts - create_ts end) as c_readtime,
  count(case when type = 'D' then 1 end) as d_count
from stats
group by task_i
order by task_i;

    TASK_I    A_COUNT     A_READ A_READTIME    B_COUNT     B_READ B_READTIME    C_COUNT     C_READ C_READTIME    D_COUNT
---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ----------
       111          2          1     1.0035          2          1     1.0035          1          1     1.0035          0
       222          0          0                     0          0                     2          1     1.0035          3
       333          1          1     1.0035          2          1     1.0035          0          0                     2
       444          1          1     1.0035          1          1     1.0035          1          1     1.0035          2

The date arithmetic means those are shown in fractions of a day; I've amended your data slightly to add five minutes to each end time, to closer match your original results. To get the string output you need you have to manipulate those a bit, and that's a bit less repetitive if you use an inline view to do the calculation:

select task_i,
  a_count,
  a_read,
  case when a_readtime > 1 then trunc(24 * a_readtime) || ' hrs ' end
    || round(60 * mod(24 * nvl(a_readtime, 0), 1)) || ' min'
    as a_readtime
from (
  select task_i,
    count(case when type = 'A' then 1 end) as a_count,
    count(case when type = 'A' and end_ts is not null then 1 end) as a_read,
    avg(case when type = 'A' and end_ts is not null then end_ts - create_ts end) as a_readtime,
    count(case when type = 'B' then 1 end) as b_count,
    count(case when type = 'B' and end_ts is not null then 1 end) as b_read,
    avg(case when type = 'B' and end_ts is not null then end_ts - create_ts end) as b_readtime,
    count(case when type = 'C' then 1 end) as c_count,
    count(case when type = 'C' and end_ts is not null then 1 end) as c_read,
    avg(case when type = 'C' and end_ts is not null then end_ts - create_ts end) as c_readtime,
    count(case when type = 'D' then 1 end) as d_count
  from stats
  group by task_i
)
order by task_i;

    TASK_I    A_COUNT     A_READ A_READTIME    
---------- ---------- ---------- ---------------
       111          2          1 24 hrs 5 min   
       222          0          0 0 min            
       333          1          1 24 hrs 5 min   
       444          1          1 24 hrs 5 min   

... and repeat the outer query select list items for the same data for B, C and D.

If you're on 11g or higher you could use pivot to avoid repeating the translation of the time difference into a string:

select * from (
  select task_i, type, type_count, type_read,
    case when type_readtime > 1 then trunc(24 * type_readtime) || ' hrs ' end
      || round(60 * mod(24 * type_readtime, 1)) || ' min'
    as type_readtime
  from (
    select task_i,
      type,
      count(*) as type_count,
      count(case when end_ts is not null then 1 end) as type_read,
      avg(case when end_ts is not null then end_ts - create_ts end) as type_readtime
    from stats
    group by task_i, type
  )
)
pivot (max(type_count) as count, max(type_read) as read, max(type_readtime) as readtime
  for (type) in ('A' as a, 'B' as b, 'C' as c, 'D' as d))
order by task_i;

    TASK_I    A_COUNT     A_READ A_READTIME         B_COUNT     B_READ B_READTIME         C_COUNT     C_READ C_READTIME         D_COUNT     D_READ D_READTIME    
---------- ---------- ---------- --------------- ---------- ---------- --------------- ---------- ---------- --------------- ---------- ---------- ---------------
       111          2          1 24 hrs 5 min             2          1 24 hrs 5 min             1          1 24 hrs 5 min                                         
       222                                                                                      2          1 24 hrs 5 min             3          2 24 hrs 5 min   
       333          1          1 24 hrs 5 min             2          1 24 hrs 5 min                                                   2          1 24 hrs 5 min   
       444          1          1 24 hrs 5 min             1          1 24 hrs 5 min             1          1 24 hrs 5 min             2          1 24 hrs 5 min   

... but if you don't want null values you'll need to replace them with nvl() calls in the final select list, which makes it messier again.