Paul Paul - 1 month ago 5
MySQL Question

How to select all dates without data using JOIN SQLs in MySQL?

I have 3 tables, users, sites, and site_traffic, respectively. The users table contains the name of the user and other details about the user. Each user has 1 or more sites which is stored in the sites table. Now every site has its own traffic data.

What I am trying to accomplish to select all the dates that has no traffic data for each site for all users. This should display all the user's names, the site_ids of each user and the date that has no data for each of those sites.

As of this query I am able to get the dates that have no data just for 1 specific user. How do I modify this query to list all the users and their sites and the dates that have no data for each site.

Here's my query:

SELECT b.dates_without_data
FROM (
SELECT a.dates AS dates_without_data
FROM (
SELECT CURDATE() - INTERVAL (a.a + (10 * b.a) + (100 * c.a)) DAY as dates
FROM (SELECT 0 AS a UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9) as a
CROSS JOIN (SELECT 0 AS a UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9) as b
CROSS JOIN (SELECT 0 AS a UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9) as c
) a
WHERE a.dates >= DATE_SUB(DATE_SUB(NOW(),INTERVAL 1 DAY), INTERVAL 35 DAY)
) b
WHERE b.dates_without_data NOT IN (
SELECT recorded_on
FROM site_traffic, sites, users
WHERE site_traffic.site_id = sites.site_id
AND sites.user_id = users.user_id
AND users.user_id = 1
)
AND b.dates_without_data < DATE_SUB(NOW(),INTERVAL 1 DAY)
ORDER BY b.dates_without_data ASC


Thanks for your help guys.

Answer

I would use an anti-join pattern.

First, do a cross join operation between the generated list of possible dates and all sites. That gives us rows for every site, for every day. Then go ahead and do the join to the users table.

The trick is the anti-join. We take that set of all sites and all days, and then "match" to rows in site_traffic. We just want to return the rows that don't have a match. We can do that with an outer join, and then add a condition in the WHERE clause that excludes a row if it found a match. Leaving only rows that didn't have a match.

Something like this:

 SELECT s.site_id
      , u.user_id
      , d.dt       AS date_without_data
   FROM (

    SELECT DATE(NOW()) - INTERVAL (a.a + (10 * b.a) + (100 * c.a)) DAY AS dt
      FROM (SELECT 0 AS a UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9) as a
      CROSS JOIN (SELECT 0 AS a UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9) as b
      CROSS JOIN (SELECT 0 AS a UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9) as c
    HAVING dt >= DATE(NOW()) + INTERVAL -1-35 DAY
       AND dt <  DATE(NOW()) + INTERVAL -1 DAY

        ) d
  CROSS
   JOIN site s
   JOIN users u
     ON u.user_id = s.user_id
  LEFT
  JOIN site_traffic t
    ON t.site_id      = s.site_id
    ON t.recorded_on >= d.dt
   AND t.recorded_on  < d.dt + INTERVAL 1 DAY
 WHERE t.site_id IS NULL

 ORDER BY s.site_id, u.user_id

The trick there is the condition in the WHERE clause. Any rows that found matching rows in site_traffic will have a non-NULL value for site_id. (The equality comparison to site_id in the join condition guarantees us that.) So if we exclude all rows that have non-NULL values, we are left with the rows that didn't have a match.

(I assumed that recorded_on was a datetime, so I used a range comparison... to match any value of recorded_on within the given date. If recorded_on is actually a date (with no time) then we could just do a simpler equality comparison.)

Add to the SELECT list whatever expressions you need, from the u and s tables.

Some people suggest that the inline view d (to generate a list of "all dates") looks kind of messy. But I'm fine with it.

It would be nice if MySQL provided a table valued function, or some other "prettier" mechanism for generating a series of integer values.

I would include all of the conditions on date within the view query itself, get it done inside the view, and not have to muck with the outer query.