ere ere - 3 months ago 10
SQL Question

refactoring postgres join vs except

I'm trying to refactor one of my queries and I'm just not doing something quite right.

I would like to combine two queries and make one, but I'm confused on how it works with a LEFT JOIN.


All QuizMasters who have a state of "active"


Minus (-)


QuizMasters who have an "active" event on a given day (not all QuizMasters have events only ~25%).


Definitions



  • Events store
    start_at
    dow/wday for a given event, eg Monday-Sunday (albeit as a DateTime, only the wday and time are relevant).

  • Events and QuizMasters have states which are either "active" or not.




Old query (which data is correct)



SELECT first_name, last_name, email
FROM quiz_masters
WHERE quiz_masters.state = 'active' # (175 rows)

EXCEPT

SELECT first_name, last_name, email
FROM quiz_masters
LEFT JOIN events ON events.quiz_master_id = quiz_masters.id
WHERE quiz_masters.state = 'active'
AND EXTRACT(dow FROM events.start_at::timestamp::date) = 3 AND events.state = 'active'
GROUP BY first_name, last_name, email # (- 20 rows)


Total of 155 rows match the query.

Combined query which doesn't work



I would like to combine them into something like:

SELECT first_name, last_name, email
FROM quiz_masters
LEFT JOIN events ON events.quiz_master_id = quiz_masters.id
WHERE quiz_masters.state = 'active'
AND events.quiz_master_id IS null
OR (EXTRACT(dow FROM events.start_at::timestamp::date) <> 3 AND events.state = 'active')
GROUP BY first_name, last_name, email


144 Rows (missing 11 rows)

But I'm not sure how to keep all the rows from some
quiz_masters
who are active but have no event whatsoever. It still removes them. Maybe I need some other kind of join?

Answer

In the first query you exclude all events that are active on Wednesday, so inactive events on any day are included. In the combined query you include all events that are active on any day but Wednesday and no inactive events whatsoever. That's your 11 rows difference.

This should get you back to 155 rows:

SELECT DISTINCT first_name, last_name, email
FROM quiz_masters
LEFT JOIN (
    SELECT quiz_master_id AS id, state
    FROM events
    WHERE EXTRACT(dow FROM events.start_at::timestamp) = 3
    AND events.state = 'active') ev USING (id)
WHERE quiz_masters.state = 'active'
AND ev.state IS NULL;

Apparently you have multiple entries for your quiz masters, but rather than doing a GROUP BY you should select DISTINCT rows. GROUP BY should only be used with aggregate functions.

Comments