djCode djCode - 2 years ago 95
SQL Question

Calculations from table data

I am trying to calculate some counts and average timespans from the data in table:



+---------+--------+------+-----------+----------+
| STATS_I | TASK_I | TYPE | CREATE_TS | END_TS |
+---------+--------+------+-----------+----------+
| 1 | 111 | A | 30-05-16 | 30-05-16 |
+---------+--------+------+-----------+----------+
| 2 | 111 | A | 30-05-16 | |
+---------+--------+------+-----------+----------+
| 3 | 111 | B | 30-05-16 | 30-05-16 |
+---------+--------+------+-----------+----------+
| 4 | 111 | B | 30-05-16 | |
+---------+--------+------+-----------+----------+
| 5 | 111 | C | 30-05-16 | 30-05-16 |
+---------+--------+------+-----------+----------+
| 6 | 222 | D | 30-05-16 | |
+---------+--------+------+-----------+----------+
| 7 | 222 | D | 30-05-16 | 30-05-16 |
+---------+--------+------+-----------+----------+
| 8 | 222 | C | 30-05-16 | 30-05-16 |
+---------+--------+------+-----------+----------+
| 9 | 222 | C | 30-05-16 | 30-05-16 |
+---------+--------+------+-----------+----------+
| 10 | 222 | C | 30-05-16 | |
+---------+--------+------+-----------+----------+
| 11 | 333 | A | 30-05-16 | 30-05-16 |
+---------+--------+------+-----------+----------+
| 12 | 333 | B | 30-05-16 | 30-05-16 |
+---------+--------+------+-----------+----------+
| 13 | 333 | B | 30-05-16 | 31-05-16 |
+---------+--------+------+-----------+----------+
| 14 | 333 | D | 30-05-16 | 30-05-16 |
+---------+--------+------+-----------+----------+
| 15 | 333 | D | 30-05-16 | 31-05-16 |
+---------+--------+------+-----------+----------+
| 16 | 444 | D | 30-05-16 | 30-05-16 |
+---------+--------+------+-----------+----------+
| 17 | 444 | D | 30-05-16 | 31-05-16 |
+---------+--------+------+-----------+----------+
| 18 | 444 | C | 30-05-16 | 30-05-16 |
+---------+--------+------+-----------+----------+
| 19 | 444 | B | 30-05-16 | 30-05-16 |
+---------+--------+------+-----------+----------+
| 20 | 444 | A | 30-05-16 | 30-05-16 |
+---------+--------+------+-----------+----------+


The sample table can be populated with:

CREATE TABLE "STATS" ("STATS_I" NUMBER(10,0), "TASK_I" NUMBER(10,0), "TYPE" VARCHAR2(30), "CREATE_TS" DATE, "END_TS" DATE);
Insert into STATS (STATS_I,TASK_I,TYPE,CREATE_TS,END_TS) values (1,111,'A',to_date('30-05-16','DD-MM-RR'),to_date('30-05-16','DD-MM-RR'));
Insert into STATS (STATS_I,TASK_I,TYPE,CREATE_TS,END_TS) values (2,111,'A',to_date('30-05-16','DD-MM-RR'),null);
Insert into STATS (STATS_I,TASK_I,TYPE,CREATE_TS,END_TS) values (3,111,'B',to_date('30-05-16','DD-MM-RR'),to_date('30-05-16','DD-MM-RR'));
Insert into STATS (STATS_I,TASK_I,TYPE,CREATE_TS,END_TS) values (4,111,'B',to_date('30-05-16','DD-MM-RR'),null);
Insert into STATS (STATS_I,TASK_I,TYPE,CREATE_TS,END_TS) values (5,111,'C',to_date('30-05-16','DD-MM-RR'),to_date('30-05-16','DD-MM-RR'));
Insert into STATS (STATS_I,TASK_I,TYPE,CREATE_TS,END_TS) values (6,222,'D',to_date('30-05-16','DD-MM-RR'),null);
Insert into STATS (STATS_I,TASK_I,TYPE,CREATE_TS,END_TS) values (7,222,'D',to_date('30-05-16','DD-MM-RR'),to_date('30-05-16','DD-MM-RR'));
Insert into STATS (STATS_I,TASK_I,TYPE,CREATE_TS,END_TS) values (8,222,'C',to_date('30-05-16','DD-MM-RR'),to_date('30-05-16','DD-MM-RR'));
Insert into STATS (STATS_I,TASK_I,TYPE,CREATE_TS,END_TS) values (9,222,'C',to_date('30-05-16','DD-MM-RR'),to_date('30-05-16','DD-MM-RR'));
Insert into STATS (STATS_I,TASK_I,TYPE,CREATE_TS,END_TS) values (10,222,'C',to_date('30-05-16','DD-MM-RR'),null);
Insert into STATS (STATS_I,TASK_I,TYPE,CREATE_TS,END_TS) values (11,333,'A',to_date('30-05-16','DD-MM-RR'),to_date('30-05-16','DD-MM-RR'));
Insert into STATS (STATS_I,TASK_I,TYPE,CREATE_TS,END_TS) values (12,333,'B',to_date('30-05-16','DD-MM-RR'),to_date('30-05-16','DD-MM-RR'));
Insert into STATS (STATS_I,TASK_I,TYPE,CREATE_TS,END_TS) values (13,333,'B',to_date('30-05-16','DD-MM-RR'),null);
Insert into STATS (STATS_I,TASK_I,TYPE,CREATE_TS,END_TS) values (14,333,'D',to_date('30-05-16','DD-MM-RR'),to_date('30-05-16','DD-MM-RR'));
Insert into STATS (STATS_I,TASK_I,TYPE,CREATE_TS,END_TS) values (15,333,'D',to_date('30-05-16','DD-MM-RR'),null);
Insert into STATS (STATS_I,TASK_I,TYPE,CREATE_TS,END_TS) values (16,444,'D',to_date('30-05-16','DD-MM-RR'),to_date('30-05-16','DD-MM-RR'));
Insert into STATS (STATS_I,TASK_I,TYPE,CREATE_TS,END_TS) values (17,444,'D',to_date('30-05-16','DD-MM-RR'),null);
Insert into STATS (STATS_I,TASK_I,TYPE,CREATE_TS,END_TS) values (18,444,'C',to_date('30-05-16','DD-MM-RR'),to_date('30-05-16','DD-MM-RR'));
Insert into STATS (STATS_I,TASK_I,TYPE,CREATE_TS,END_TS) values (19,444,'B',to_date('30-05-16','DD-MM-RR'),to_date('30-05-16','DD-MM-RR'));
Insert into STATS (STATS_I,TASK_I,TYPE,CREATE_TS,END_TS) values (20,444,'A',to_date('30-05-16','DD-MM-RR'),to_date('31-05-16','DD-MM-RR'));


-
end_ts
and
create_ts
are kept at 10 minutes difference

The output I'm trying to get is:

+--------+--------------+-------------------+-------------------+-------+
| Task_i | A | B | C | D |
+ +--------------+-------------------+-------------------+-------+
| | Count | Read | Count | Avg. Time | Count | Avg. Time | Count |
| | | | | to Read | | to Read | |
+--------+-------+------+-------+-----------+-------+-----------+-------+
| 111 | 2 | 1 | 2 | 10 min | 1 | 10 min | 0 |
+--------+-------+------+-------+-----------+-------+-----------+-------+
| 222 | 0 | 0 | 0 | 0 min | 3 | 10 min | 2 |
+--------+-------+------+-------+-----------+-------+-----------+-------+
| 333 | 1 | 1 | 2 | 10 min | 0 | 0 min | 2 |
+--------+-------+------+-------+-----------+-------+-----------+-------+
| 444 | 1 | 1 | 1 | 10 min | 1 | 10 min | 2 |
+--------+-------+------+-------+-----------+-------+-----------+-------+


Where for each
task_i
:


  • 'Count' is the count of rows for the type;

  • 'Read' is the count of rows for the type where
    end_ts
    is not null

  • 'Average time to read' is calculated by
    (end_ts - create_ts )/count of type
    ignoring the rows where
    end_ts
    is null



So far I have tried to create four tables, one for each for each type, and then join them on
task_i
:

(((select count(*) as A from stats s where s.type='A' group by s.TYPE UNION ALL
select count(*) as B from stats s where s.type='B' group by s.TYPE ) UNION ALL
select count(*) as C from stats s where s.type='C' group by s.TYPE ) UNION ALL
select count(*) as D from stats s where s.type='D' group by s.TYPE ) ;


but this doesn't produce anything like what I need:

A
----------
4
5
4
7


What am I doing wrong, and how can I generate the required output?

Answer Source

The counts, which are the part you attempted, can be found with a conditional count - with a case expression inside - rather than with a union:

select task_i,
  count(case when type = 'A' then 1 end) as a_count,
  count(case when type = 'B' then 1 end) as b_count,
  count(case when type = 'C' then 1 end) as c_count,
  count(case when type = 'D' then 1 end) as d_count
from stats
group by task_i
order by task_i;

    TASK_I    A_COUNT    B_COUNT    C_COUNT    D_COUNT
---------- ---------- ---------- ---------- ----------
       111          2          2          1          0
       222          0          0          2          3
       333          1          2          0          2
       444          1          1          1          2

The number of 'reads', where end_ts is not null, can be found by modifying the case expression:

select task_i,
  count(case when type = 'A' then 1 end) as a_count,
  count(case when type = 'A' and end_ts is not null then 1 end) as a_read,
  count(case when type = 'B' then 1 end) as b_count,
  count(case when type = 'B' and end_ts is not null then 1 end) as b_read,
  count(case when type = 'C' then 1 end) as c_count,
  count(case when type = 'C' and end_ts is not null then 1 end) as c_read,
  count(case when type = 'D' then 1 end) as d_count
from stats
group by task_i
order by task_i;

    TASK_I    A_COUNT     A_READ    B_COUNT     B_READ    C_COUNT     C_READ    D_COUNT
---------- ---------- ---------- ---------- ---------- ---------- ---------- ----------
       111          2          1          2          1          1          1          0
       222          0          0          0          0          2          1          3
       333          1          1          2          1          0          0          2
       444          1          1          1          1          1          1          2

The elapse time can be found the same way really, but using avg() instead of count():

select task_i,
  count(case when type = 'A' then 1 end) as a_count,
  count(case when type = 'A' and end_ts is not null then 1 end) as a_read,
  avg(case when type = 'A' and end_ts is not null then end_ts - create_ts end) as a_readtime,
  count(case when type = 'B' then 1 end) as b_count,
  count(case when type = 'B' and end_ts is not null then 1 end) as b_read,
  avg(case when type = 'B' and end_ts is not null then end_ts - create_ts end) as b_readtime,
  count(case when type = 'C' then 1 end) as c_count,
  count(case when type = 'C' and end_ts is not null then 1 end) as c_read,
  avg(case when type = 'C' and end_ts is not null then end_ts - create_ts end) as c_readtime,
  count(case when type = 'D' then 1 end) as d_count
from stats
group by task_i
order by task_i;

    TASK_I    A_COUNT     A_READ A_READTIME    B_COUNT     B_READ B_READTIME    C_COUNT     C_READ C_READTIME    D_COUNT
---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ----------
       111          2          1     1.0035          2          1     1.0035          1          1     1.0035          0
       222          0          0                     0          0                     2          1     1.0035          3
       333          1          1     1.0035          2          1     1.0035          0          0                     2
       444          1          1     1.0035          1          1     1.0035          1          1     1.0035          2

The date arithmetic means those are shown in fractions of a day; I've amended your data slightly to add five minutes to each end time, to closer match your original results. To get the string output you need you have to manipulate those a bit, and that's a bit less repetitive if you use an inline view to do the calculation:

select task_i,
  a_count,
  a_read,
  case when a_readtime > 1 then trunc(24 * a_readtime) || ' hrs ' end
    || round(60 * mod(24 * nvl(a_readtime, 0), 1)) || ' min'
    as a_readtime
from (
  select task_i,
    count(case when type = 'A' then 1 end) as a_count,
    count(case when type = 'A' and end_ts is not null then 1 end) as a_read,
    avg(case when type = 'A' and end_ts is not null then end_ts - create_ts end) as a_readtime,
    count(case when type = 'B' then 1 end) as b_count,
    count(case when type = 'B' and end_ts is not null then 1 end) as b_read,
    avg(case when type = 'B' and end_ts is not null then end_ts - create_ts end) as b_readtime,
    count(case when type = 'C' then 1 end) as c_count,
    count(case when type = 'C' and end_ts is not null then 1 end) as c_read,
    avg(case when type = 'C' and end_ts is not null then end_ts - create_ts end) as c_readtime,
    count(case when type = 'D' then 1 end) as d_count
  from stats
  group by task_i
)
order by task_i;

    TASK_I    A_COUNT     A_READ A_READTIME    
---------- ---------- ---------- ---------------
       111          2          1 24 hrs 5 min   
       222          0          0 0 min            
       333          1          1 24 hrs 5 min   
       444          1          1 24 hrs 5 min   

... and repeat the outer query select list items for the same data for B, C and D.

If you're on 11g or higher you could use pivot to avoid repeating the translation of the time difference into a string:

select * from (
  select task_i, type, type_count, type_read,
    case when type_readtime > 1 then trunc(24 * type_readtime) || ' hrs ' end
      || round(60 * mod(24 * type_readtime, 1)) || ' min'
    as type_readtime
  from (
    select task_i,
      type,
      count(*) as type_count,
      count(case when end_ts is not null then 1 end) as type_read,
      avg(case when end_ts is not null then end_ts - create_ts end) as type_readtime
    from stats
    group by task_i, type
  )
)
pivot (max(type_count) as count, max(type_read) as read, max(type_readtime) as readtime
  for (type) in ('A' as a, 'B' as b, 'C' as c, 'D' as d))
order by task_i;

    TASK_I    A_COUNT     A_READ A_READTIME         B_COUNT     B_READ B_READTIME         C_COUNT     C_READ C_READTIME         D_COUNT     D_READ D_READTIME    
---------- ---------- ---------- --------------- ---------- ---------- --------------- ---------- ---------- --------------- ---------- ---------- ---------------
       111          2          1 24 hrs 5 min             2          1 24 hrs 5 min             1          1 24 hrs 5 min                                         
       222                                                                                      2          1 24 hrs 5 min             3          2 24 hrs 5 min   
       333          1          1 24 hrs 5 min             2          1 24 hrs 5 min                                                   2          1 24 hrs 5 min   
       444          1          1 24 hrs 5 min             1          1 24 hrs 5 min             1          1 24 hrs 5 min             2          1 24 hrs 5 min   

... but if you don't want null values you'll need to replace them with nvl() calls in the final select list, which makes it messier again.

Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download