Ethan Strider Ethan Strider - 4 months ago 10
MySQL Question

Mysql compare average results with current result in a single query

I am wondering if it's possible to run a single query that tells me the following:



Get the names of any applications that have (currently executing) jobs exceeding the average historical runtime


Table 1: Job History

- Job start time

- Job end time

- Job status (success, failure, executing, etc.)

- Application ID that triggered job


Table 2: Applications

- ID

- Name


It seemed like a simple question to answer, but creating a mysql query has proven to be a rather involved (albeit intriguing) challenge...

The query needs to grab the jobs where
satus = executing
, use the
application.id
to look for other jobs that have a matching
application.id
and
status = success
, average the (
end_time
-
start_time
)
, and then compare the
average time
with the (
current_time
-
start_time
)
of the currently executing jobs. Finally, it must use the application ID from any jobs that fail this test to grab the
application.name
from the application table. Is that even possible to do in a single query?

For the sake of this question, let's assume
current_time
is passed in as an argument.

I made an attempt at a triple nested query, but I'm getting the following error, and I don't know why. I've spent a few hours trying to get this to work, but I'm already in over my head here:

ERROR 1054 (42S22): Unknown column 'jh.start_time' in 'having clause'


This is my attempt:

SELECT name FROM application
WHERE application.id IN (
SELECT application_id
FROM job_history AS jh
WHERE application_id IN (
SELECT application_id
FROM job_history
WHERE status='EXECUTION' )
AND jh.status='SUCCESS'
HAVING (avg(jh.end_time - jh.start_time)) < (current_time - jh.start_time)
);


Edit: As suggested, here is some sample data.

Table 1

+--------+------------+------------+----------+----------------+
| job_id | status | start_time | end_time | application_id |
+--------+------------+------------+----------+----------------+
| job1 | successful | 100 | 200 | app1 |
| job2 | failed | 150 | 350 | app2 |
| job3 | successful | 200 | 400 | app1 |
| job4 | execution | 500 | 0 | app1 |
| job5 | successful | 600 | 800 | app3 |
+--------+------------+------------+----------+----------------+


Table 2

+------+------------------+
| id | name |
+------+------------------+
| app1 | Team Green's app |
| app2 | Team Blue's app |
| app3 | Team Red's app |
+------+------------------+


I would like to grab job4, use the
application_id
to find job1 and job3. Then take the average run time of job1 and job3 and compare it against the current runtime of job4. If the current runtime is greater than the average runtime, then I would like to report the application name: Team Green's app.

Edit #2: Follow-up -- Weird arithmetic



(Thanks to @sgeddes for providing the solution.)

I'm trying to return the results of the query in a more human readable format: I added the average and actual run times (in minutes converted from milliseconds). However, I'm getting strange results. Sometimes, the
actual_run_time_in_min
will be smaller than the
avg_run_time_in_min
, even though that's impossible. It would NOT return the value if that were the case. I'm sure the issue has something to do with the way I'm doing the arithmetic, but I have no idea how to fix it.

I also added a multiplier (currently set to
1.4
) to the average run time to make sure that a given job has definitely exceeded the expected run time.

select distinct a.name,
TRUNCATE(((t.avgtime / 1000) % 60), 2) as avg_run_time_in_min,
TRUNCATE(((($CURRENT_TIME - jh.start_time) / 1000) % 60), 2) as actual_run_time_in_min
from application a
join job_history jh on a.id = jh.application_id
join (
select application_id, (avg(end_time - start_time) * $MULTIPLIER) as avgtime
from job_history
where status = 'SUCCESS'
group by application_id
) t on a.id = t.application_id
where jh.status = 'EXECUTING' and
($CURRENT_TIME - jh.start_time) > t.avgtime;


Here are some actual results:

+---------------+---------------------+------------------------+
| name | avg_run_time_in_min | actual_run_time_in_min |
+---------------+---------------------+------------------------+
| red-team | 26.50 | 48.03 |
| green-team | 44.75 | 5.74 |
+---------------+---------------------+------------------------+

Answer

If I'm understanding your question correctly, here's one option using a join. From here it's easy to adjust the where criteria to your needs:

select distinct a.name
from applications a 
    join jobhistory jh on a.id = jh.applicationid
    join (
      select applicationid, avg(endtime - starttime) avgtime
      from jobhistory
      where status = 'success'
      group by applicationid
    ) t on a.id = t.applicationid
where jh.status = 'execution' and 
  @current_time - jh.starttime > t.avgtime