djCode - 2 years ago 95
SQL Question

# Calculations from table data

I am trying to calculate some counts and average timespans from the data in table:

``````+---------+--------+------+-----------+----------+
| STATS_I | TASK_I | TYPE | CREATE_TS | END_TS   |
+---------+--------+------+-----------+----------+
| 1       | 111    | A    | 30-05-16  | 30-05-16 |
+---------+--------+------+-----------+----------+
| 2       | 111    | A    | 30-05-16  |          |
+---------+--------+------+-----------+----------+
| 3       | 111    | B    | 30-05-16  | 30-05-16 |
+---------+--------+------+-----------+----------+
| 4       | 111    | B    | 30-05-16  |          |
+---------+--------+------+-----------+----------+
| 5       | 111    | C    | 30-05-16  | 30-05-16 |
+---------+--------+------+-----------+----------+
| 6       | 222    | D    | 30-05-16  |          |
+---------+--------+------+-----------+----------+
| 7       | 222    | D    | 30-05-16  | 30-05-16 |
+---------+--------+------+-----------+----------+
| 8       | 222    | C    | 30-05-16  | 30-05-16 |
+---------+--------+------+-----------+----------+
| 9       | 222    | C    | 30-05-16  | 30-05-16 |
+---------+--------+------+-----------+----------+
| 10      | 222    | C    | 30-05-16  |          |
+---------+--------+------+-----------+----------+
| 11      | 333    | A    | 30-05-16  | 30-05-16 |
+---------+--------+------+-----------+----------+
| 12      | 333    | B    | 30-05-16  | 30-05-16 |
+---------+--------+------+-----------+----------+
| 13      | 333    | B    | 30-05-16  | 31-05-16 |
+---------+--------+------+-----------+----------+
| 14      | 333    | D    | 30-05-16  | 30-05-16 |
+---------+--------+------+-----------+----------+
| 15      | 333    | D    | 30-05-16  | 31-05-16 |
+---------+--------+------+-----------+----------+
| 16      | 444    | D    | 30-05-16  | 30-05-16 |
+---------+--------+------+-----------+----------+
| 17      | 444    | D    | 30-05-16  | 31-05-16 |
+---------+--------+------+-----------+----------+
| 18      | 444    | C    | 30-05-16  | 30-05-16 |
+---------+--------+------+-----------+----------+
| 19      | 444    | B    | 30-05-16  | 30-05-16 |
+---------+--------+------+-----------+----------+
| 20      | 444    | A    | 30-05-16  | 30-05-16 |
+---------+--------+------+-----------+----------+
``````

The sample table can be populated with:

``````CREATE TABLE "STATS" ("STATS_I" NUMBER(10,0), "TASK_I" NUMBER(10,0), "TYPE" VARCHAR2(30), "CREATE_TS" DATE, "END_TS" DATE);
Insert into STATS (STATS_I,TASK_I,TYPE,CREATE_TS,END_TS) values (1,111,'A',to_date('30-05-16','DD-MM-RR'),to_date('30-05-16','DD-MM-RR'));
Insert into STATS (STATS_I,TASK_I,TYPE,CREATE_TS,END_TS) values (2,111,'A',to_date('30-05-16','DD-MM-RR'),null);
Insert into STATS (STATS_I,TASK_I,TYPE,CREATE_TS,END_TS) values (3,111,'B',to_date('30-05-16','DD-MM-RR'),to_date('30-05-16','DD-MM-RR'));
Insert into STATS (STATS_I,TASK_I,TYPE,CREATE_TS,END_TS) values (4,111,'B',to_date('30-05-16','DD-MM-RR'),null);
Insert into STATS (STATS_I,TASK_I,TYPE,CREATE_TS,END_TS) values (5,111,'C',to_date('30-05-16','DD-MM-RR'),to_date('30-05-16','DD-MM-RR'));
Insert into STATS (STATS_I,TASK_I,TYPE,CREATE_TS,END_TS) values (6,222,'D',to_date('30-05-16','DD-MM-RR'),null);
Insert into STATS (STATS_I,TASK_I,TYPE,CREATE_TS,END_TS) values (7,222,'D',to_date('30-05-16','DD-MM-RR'),to_date('30-05-16','DD-MM-RR'));
Insert into STATS (STATS_I,TASK_I,TYPE,CREATE_TS,END_TS) values (8,222,'C',to_date('30-05-16','DD-MM-RR'),to_date('30-05-16','DD-MM-RR'));
Insert into STATS (STATS_I,TASK_I,TYPE,CREATE_TS,END_TS) values (9,222,'C',to_date('30-05-16','DD-MM-RR'),to_date('30-05-16','DD-MM-RR'));
Insert into STATS (STATS_I,TASK_I,TYPE,CREATE_TS,END_TS) values (10,222,'C',to_date('30-05-16','DD-MM-RR'),null);
Insert into STATS (STATS_I,TASK_I,TYPE,CREATE_TS,END_TS) values (11,333,'A',to_date('30-05-16','DD-MM-RR'),to_date('30-05-16','DD-MM-RR'));
Insert into STATS (STATS_I,TASK_I,TYPE,CREATE_TS,END_TS) values (12,333,'B',to_date('30-05-16','DD-MM-RR'),to_date('30-05-16','DD-MM-RR'));
Insert into STATS (STATS_I,TASK_I,TYPE,CREATE_TS,END_TS) values (13,333,'B',to_date('30-05-16','DD-MM-RR'),null);
Insert into STATS (STATS_I,TASK_I,TYPE,CREATE_TS,END_TS) values (14,333,'D',to_date('30-05-16','DD-MM-RR'),to_date('30-05-16','DD-MM-RR'));
Insert into STATS (STATS_I,TASK_I,TYPE,CREATE_TS,END_TS) values (15,333,'D',to_date('30-05-16','DD-MM-RR'),null);
Insert into STATS (STATS_I,TASK_I,TYPE,CREATE_TS,END_TS) values (16,444,'D',to_date('30-05-16','DD-MM-RR'),to_date('30-05-16','DD-MM-RR'));
Insert into STATS (STATS_I,TASK_I,TYPE,CREATE_TS,END_TS) values (17,444,'D',to_date('30-05-16','DD-MM-RR'),null);
Insert into STATS (STATS_I,TASK_I,TYPE,CREATE_TS,END_TS) values (18,444,'C',to_date('30-05-16','DD-MM-RR'),to_date('30-05-16','DD-MM-RR'));
Insert into STATS (STATS_I,TASK_I,TYPE,CREATE_TS,END_TS) values (19,444,'B',to_date('30-05-16','DD-MM-RR'),to_date('30-05-16','DD-MM-RR'));
Insert into STATS (STATS_I,TASK_I,TYPE,CREATE_TS,END_TS) values (20,444,'A',to_date('30-05-16','DD-MM-RR'),to_date('31-05-16','DD-MM-RR'));
``````

-
`end_ts`
and
`create_ts`
are kept at 10 minutes difference

The output I'm trying to get is:

``````+--------+--------------+-------------------+-------------------+-------+
| Task_i | A            | B                 | C                 | D     |
+        +--------------+-------------------+-------------------+-------+
|        | Count | Read | Count | Avg. Time | Count | Avg. Time | Count |
|        |       |      |       |   to Read |       |   to Read |       |
+--------+-------+------+-------+-----------+-------+-----------+-------+
| 111    | 2     | 1    | 2     | 10 min    | 1     | 10 min    | 0     |
+--------+-------+------+-------+-----------+-------+-----------+-------+
| 222    | 0     | 0    | 0     | 0 min     | 3     | 10 min    | 2     |
+--------+-------+------+-------+-----------+-------+-----------+-------+
| 333    | 1     | 1    | 2     | 10 min    | 0     | 0 min     | 2     |
+--------+-------+------+-------+-----------+-------+-----------+-------+
| 444    | 1     | 1    | 1     | 10 min    | 1     | 10 min    | 2     |
+--------+-------+------+-------+-----------+-------+-----------+-------+
``````

Where for each
`task_i`
:

• 'Count' is the count of rows for the type;

• 'Read' is the count of rows for the type where
`end_ts`
is not null

• 'Average time to read' is calculated by
`(end_ts - create_ts )/count of type`
ignoring the rows where
`end_ts`
is null

So far I have tried to create four tables, one for each for each type, and then join them on
`task_i`
:

``````(((select count(*) as A from stats s   where s.type='A' group by s.TYPE  UNION ALL
select count(*) as B from stats s   where s.type='B' group by s.TYPE ) UNION ALL
select count(*) as C from stats s   where s.type='C' group by s.TYPE ) UNION ALL
select count(*) as D from stats s   where s.type='D' group by s.TYPE ) ;
``````

but this doesn't produce anything like what I need:

``````         A
----------
4
5
4
7
``````

What am I doing wrong, and how can I generate the required output?

The counts, which are the part you attempted, can be found with a conditional count - with a case expression inside - rather than with a union:

``````select task_i,
count(case when type = 'A' then 1 end) as a_count,
count(case when type = 'B' then 1 end) as b_count,
count(case when type = 'C' then 1 end) as c_count,
count(case when type = 'D' then 1 end) as d_count
from stats

---------- ---------- ---------- ---------- ----------
111          2          2          1          0
222          0          0          2          3
333          1          2          0          2
444          1          1          1          2
``````

The number of 'reads', where `end_ts is not null`, can be found by modifying the case expression:

``````select task_i,
count(case when type = 'A' then 1 end) as a_count,
count(case when type = 'A' and end_ts is not null then 1 end) as a_read,
count(case when type = 'B' then 1 end) as b_count,
count(case when type = 'B' and end_ts is not null then 1 end) as b_read,
count(case when type = 'C' then 1 end) as c_count,
count(case when type = 'C' and end_ts is not null then 1 end) as c_read,
count(case when type = 'D' then 1 end) as d_count
from stats

---------- ---------- ---------- ---------- ---------- ---------- ---------- ----------
111          2          1          2          1          1          1          0
222          0          0          0          0          2          1          3
333          1          1          2          1          0          0          2
444          1          1          1          1          1          1          2
``````

The elapse time can be found the same way really, but using `avg()` instead of `count()`:

``````select task_i,
count(case when type = 'A' then 1 end) as a_count,
count(case when type = 'A' and end_ts is not null then 1 end) as a_read,
avg(case when type = 'A' and end_ts is not null then end_ts - create_ts end) as a_readtime,
count(case when type = 'B' then 1 end) as b_count,
count(case when type = 'B' and end_ts is not null then 1 end) as b_read,
avg(case when type = 'B' and end_ts is not null then end_ts - create_ts end) as b_readtime,
count(case when type = 'C' then 1 end) as c_count,
count(case when type = 'C' and end_ts is not null then 1 end) as c_read,
avg(case when type = 'C' and end_ts is not null then end_ts - create_ts end) as c_readtime,
count(case when type = 'D' then 1 end) as d_count
from stats

---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ----------
111          2          1     1.0035          2          1     1.0035          1          1     1.0035          0
222          0          0                     0          0                     2          1     1.0035          3
333          1          1     1.0035          2          1     1.0035          0          0                     2
444          1          1     1.0035          1          1     1.0035          1          1     1.0035          2
``````

The date arithmetic means those are shown in fractions of a day; I've amended your data slightly to add five minutes to each end time, to closer match your original results. To get the string output you need you have to manipulate those a bit, and that's a bit less repetitive if you use an inline view to do the calculation:

``````select task_i,
a_count,
case when a_readtime > 1 then trunc(24 * a_readtime) || ' hrs ' end
|| round(60 * mod(24 * nvl(a_readtime, 0), 1)) || ' min'
from (
count(case when type = 'A' then 1 end) as a_count,
count(case when type = 'A' and end_ts is not null then 1 end) as a_read,
avg(case when type = 'A' and end_ts is not null then end_ts - create_ts end) as a_readtime,
count(case when type = 'B' then 1 end) as b_count,
count(case when type = 'B' and end_ts is not null then 1 end) as b_read,
avg(case when type = 'B' and end_ts is not null then end_ts - create_ts end) as b_readtime,
count(case when type = 'C' then 1 end) as c_count,
count(case when type = 'C' and end_ts is not null then 1 end) as c_read,
avg(case when type = 'C' and end_ts is not null then end_ts - create_ts end) as c_readtime,
count(case when type = 'D' then 1 end) as d_count
from stats
)

---------- ---------- ---------- ---------------
111          2          1 24 hrs 5 min
222          0          0 0 min
333          1          1 24 hrs 5 min
444          1          1 24 hrs 5 min
``````

... and repeat the outer query select list items for the same data for B, C and D.

If you're on 11g or higher you could use `pivot` to avoid repeating the translation of the time difference into a string:

``````select * from (
case when type_readtime > 1 then trunc(24 * type_readtime) || ' hrs ' end
|| round(60 * mod(24 * type_readtime, 1)) || ' min'
from (
type,
count(*) as type_count,
count(case when end_ts is not null then 1 end) as type_read,
avg(case when end_ts is not null then end_ts - create_ts end) as type_readtime
from stats
)
)
for (type) in ('A' as a, 'B' as b, 'C' as c, 'D' as d))
... but if you don't want null values you'll need to replace them with `nvl()` calls in the final select list, which makes it messier again.