Omnus Ruthius Omnus Ruthius - 6 months ago 13
SQL Question

Oracle SQL Discrepancy: COUNT(*) vs Actual Result Set

I have an application that keeps track when a file is being “attempted” to move from one server to another, as well as when it has “succeeded” or “failed.” An "attempt" should always be paired with a "success" or "failure." However, there are 63 “orphaned” attempts, meaning there have been attempts without any success or failure reported. My first query shows where I got the 63 number to begin with: I take a count of all of the attempts and subtract the successes and failures-

SELECT
(
SELECT COUNT(*) FROM e_table
WHERE e_comment LIKE '%attempt%'
AND e_date >= '23-MAY-2016'
AND e_date <= '26-MAY-2016'
)
-
(
SELECT
(
SELECT COUNT(*) FROM e_table
WHERE e_comment LIKE '%success%'
AND e_date >= '23-MAY-2016'
AND e_date <= '26-MAY-2016'
)
+
(
SELECT COUNT(*) FROM e_table
WHERE e_comment LIKE '%failure%'
AND e_date >= '23-MAY-2016'
AND e_date <= '26-MAY-2016'
) FROM dual
) AS orphaned_attempts FROM dual;


So my second query to get the specific e_id of the 63 attempts is as follows:

SELECT * FROM
(
SELECT e_id FROM e_table
WHERE e_comment LIKE '%attempt%'
AND e_date >= '23-MAY-2016'
AND e_date <= '26-MAY-2016'
)
MINUS
(
SELECT e_id FROM e_table
WHERE e_comment LIKE '%success%'
AND e_date >= '23-MAY-2016'
AND e_date <= '26-MAY-2016'
)
MINUS
(
SELECT e_id FROM e_table
WHERE e_comment LIKE '%failure%'
AND e_date >= '23-MAY-2016'
AND e_date <= '26-MAY-2016'
);


What I need (and expect based on the first query’s result set) is to have a 63-row result set with one column containing the e_id of the orphaned attempts. Instead, I am getting only 49 rows back from the second query. Any help is greatly appreciated.

Answer
SELECT
    a.e_id,
    coalesce(attempts, 0) attempts,
    coalesce(successes, 0) successes,
    coalesce(failures, 0) failures
FROM
    (
        SELECT e_id, count(*) as attempts FROM e_table
        WHERE e_comment LIKE '%attempt%' AND e_date BETWEEN '23-MAY-2016' AND '26-MAY-2016'
        GROUP BY e_id
    ) a
    full outer join
    (
        SELECT e_id, count(*) as successes FROM e_table
        WHERE e_comment LIKE '%success%' AND e_date BETWEEN '23-MAY-2016' AND '26-MAY-2016'
        GROUP BY e_id
    ) s
        on s.e_id = a.e_id
    full outer join
    (
        SELECT e_id, count(*) as failures FROM e_table
        WHERE e_comment LIKE '%failure%' AND e_date BETWEEN '23-MAY-2016' AND '26-MAY-2016'
        GROUP BY e_id
    ) f
        on f.e_id = coalesce(a.e_id, s.e_id)
WHERE
    coalesce(attempts, 0) <> coalesce(successes, 0) + coalesce(failures, 0)

I changed to full outer joins so you can verify that there are no successes and/or failures without any matching attempt. This should let you find e_ids where something's going wrong in the logging. It should be easier to start dealing with finer numbers and not just listings of id values.

Others have pointed out the potential for multiple attempts on the same id but is it conceivable that a success and failure could both be recorded the same way as say in some kind of retry scenario? We don't know what the full comments look like. As a possible explanation, can a single comment can contain more than one of the words "attempt", "success", "failure"?

Here's something else to consider: Are you sure that all your success and failures events will fall within the same date window? In other words, is there some delay following the attempt? It might not have to be very long if this happens around midnight. You may want to widen the success and failure ranges enough to compensate for this (and change to left outer joins.)

Note: Condition in the where clause has been modified to allow for multiple attempts (as noted in comments) and now just looks for a balance in the number of attempts vs. successes and failures.

Comments