Ethan Strider Ethan Strider - 2 months ago 6x
MySQL Question

Mysql compare average results with current result in a single query

I am wondering if it's possible to run a single query that tells me the following:

Get the names of any applications that have (currently executing) jobs exceeding the average historical runtime

Table 1: Job History

- Job start time

- Job end time

- Job status (success, failure, executing, etc.)

- Application ID that triggered job

Table 2: Applications

- ID

- Name

It seemed like a simple question to answer, but creating a mysql query has proven to be a rather involved (albeit intriguing) challenge...

The query needs to grab the jobs where
satus = executing
, use the
to look for other jobs that have a matching
status = success
, average the (
, and then compare the
average time
with the (
of the currently executing jobs. Finally, it must use the application ID from any jobs that fail this test to grab the
from the application table. Is that even possible to do in a single query?

For the sake of this question, let's assume
is passed in as an argument.

I made an attempt at a triple nested query, but I'm getting the following error, and I don't know why. I've spent a few hours trying to get this to work, but I'm already in over my head here:

ERROR 1054 (42S22): Unknown column 'jh.start_time' in 'having clause'

This is my attempt:

SELECT name FROM application
SELECT application_id
FROM job_history AS jh
WHERE application_id IN (
SELECT application_id
FROM job_history
AND jh.status='SUCCESS'
HAVING (avg(jh.end_time - jh.start_time)) < (current_time - jh.start_time)

Edit: As suggested, here is some sample data.

Table 1

| job_id | status | start_time | end_time | application_id |
| job1 | successful | 100 | 200 | app1 |
| job2 | failed | 150 | 350 | app2 |
| job3 | successful | 200 | 400 | app1 |
| job4 | execution | 500 | 0 | app1 |
| job5 | successful | 600 | 800 | app3 |

Table 2

| id | name |
| app1 | Team Green's app |
| app2 | Team Blue's app |
| app3 | Team Red's app |

I would like to grab job4, use the
to find job1 and job3. Then take the average run time of job1 and job3 and compare it against the current runtime of job4. If the current runtime is greater than the average runtime, then I would like to report the application name: Team Green's app.

Edit #2: Follow-up -- Weird arithmetic

(Thanks to @sgeddes for providing the solution.)

I'm trying to return the results of the query in a more human readable format: I added the average and actual run times (in minutes converted from milliseconds). However, I'm getting strange results. Sometimes, the
will be smaller than the
, even though that's impossible. It would NOT return the value if that were the case. I'm sure the issue has something to do with the way I'm doing the arithmetic, but I have no idea how to fix it.

I also added a multiplier (currently set to
) to the average run time to make sure that a given job has definitely exceeded the expected run time.

select distinct,
TRUNCATE(((t.avgtime / 1000) % 60), 2) as avg_run_time_in_min,
TRUNCATE(((($CURRENT_TIME - jh.start_time) / 1000) % 60), 2) as actual_run_time_in_min
from application a
join job_history jh on = jh.application_id
join (
select application_id, (avg(end_time - start_time) * $MULTIPLIER) as avgtime
from job_history
where status = 'SUCCESS'
group by application_id
) t on = t.application_id
where jh.status = 'EXECUTING' and
($CURRENT_TIME - jh.start_time) > t.avgtime;

Here are some actual results:

| name | avg_run_time_in_min | actual_run_time_in_min |
| red-team | 26.50 | 48.03 |
| green-team | 44.75 | 5.74 |


If I'm understanding your question correctly, here's one option using a join. From here it's easy to adjust the where criteria to your needs:

select distinct
from applications a 
    join jobhistory jh on = jh.applicationid
    join (
      select applicationid, avg(endtime - starttime) avgtime
      from jobhistory
      where status = 'success'
      group by applicationid
    ) t on = t.applicationid
where jh.status = 'execution' and 
  @current_time - jh.starttime > t.avgtime