nasoj1100 nasoj1100 - 6 months ago 8
SQL Question

How to make this long sql query a full join?

I have the following sql query. It is used to get stats on users who only logged in once this week. My problem is that I am missing some data. When I run a simple query to see how many users only logged in once this week I get five rows but this query only returns four rows. I assume this is because the tables are only left joined. As I am creating tables in the query I am having trouble and keep getting errors when trying to add in the union statement to make it a full join. Here is the query any help appreciated.

SELECT a.user_id,
a.logins,
a._date,
COALESCE(b.loaded, 0) loaded,
COALESCE(c.attempted, 0) attempted,
COALESCE(d.correct, 0) correct
FROM (SELECT l.user_id,
l.in_datetime,
Date_format(l.in_datetime, '%d/%m/%Y') _date,
Count(*) AS logins
FROM production.login l
GROUP BY user_id) a
LEFT JOIN (SELECT user_id,
Count(*) AS loaded
FROM production.score s
JOIN processedquestion pq
ON s.attempt_id = pq.attempt_id
GROUP BY user_id) b
ON a.user_id = b.user_id
LEFT JOIN (SELECT user_id,
Count(*) AS attempted
FROM production.score s
JOIN processedquestion pq
ON s.attempt_id = pq.attempt_id
WHERE s.selected_answer IS NOT NULL
GROUP BY user_id) c
ON c.user_id = b.user_id
LEFT JOIN (SELECT user_id,
Count(*) AS correct
FROM production.score s
JOIN processedquestion pq
ON s.attempt_id = pq.attempt_id
WHERE s.selected_answer = s.correct_answer
GROUP BY user_id) d
ON c.user_id = d.user_id
WHERE logins = 1
AND Year(a.in_datetime) = Year(Curdate())
AND Week(a.in_datetime) = Week(Curdate())

Answer

I don't think the problem has anything to do with full joins. The issue is that you need to move your logins date filter into the table expression. Your query wouldn't have run on systems that correctly disallow the return of a non-aggregate column in a grouping query. The single login condition can also be expressed with having. And I've consolidated the rest of the logic in a single subquery which should make it a lot simpler to follow.

select
    ...
from
    (
        select user_id, min(date_format(in_datetime, '%d/%m/%Y')) _date,
        from production.login
        where year(in_datetime) = year(curdate()) and week(in_datetime) = week(curdate())
        group by user_id
        having count(*) = 1
    ) users
        left outer join
    (
        select
            s.user_id, /* I qualified with s but not sure that was the right table */
            count(*) as loaded,
            count(s.selected_answer) as attempted,
            count(case when s.selected_answer = s.corrected_answer then 1 end) as correct
        from production.score s inner join processedquestion pq
            on pq.attempt_id = s.attempt_id
        group by user_id
    ) questions
        on questions.user_id = users.user_id

I have no idea how large your logins table but the query might run more efficiently if you were to calculate a start and end date and use in_datetime between <start_of_week> and <end_of_week>.