trembling trembling - 7 months ago 32
SQL Question

count function per row

I have to provide a list of distinct sites that are active, who have one or more domain, and whose domains are all​ deleted. Here is my query so far.

SELECT DISTINCT *
FROM sites JOIN domains ON domains.site = sites.id
WHERE domains.is_deleted = 1 AND sites.is_deleted = 0


From my research, it seems like the best way to check if a site has more than one domain is to have a
COUNT()
subquery. How can I use
COUNT()
to count the number of domains per site?

Here is a SQL fiddle.

Answer

Q: How can I use COUNT() to count the number of domains per site?

(answer to the question is at the bottom of the answer.)


There are several different queries that will return the specified result.

I'd build the query like this, starting with

"List of distinct sites that are active"

We can get the rows from sites table where the (non-null) is_deleted column is 0...

SELECT s.id
     , s.name
     , s.company
     , s.association
     , s.is_supercharged
     , s.is_deleted
  FROM sites s
 WHERE NOT s.is_deleted
 ORDER BY s.id 

"who have one or more domain"

We can write a query that returns a list of the values of the site (FK) column from domain table

SELECT d.site
  FROM domain d
 GROUP BY d.site

"and whose domains are all​ deleted"

We can write another query that returns a distinct list of values from the site column of rows in domain which have is_deleted = 0

SELECT a.site
  FROM domain a
 WHERE NOT a.is_deleted
 GROUP BY a.site

And we can put those three queries together. We can turn the last two queries into inline views (wrap thme in parens and reference them like they were tables.) We can use an inner join to (sites with at least one domain) to get only sites that have at least one domain. And use an anti-join pattern to (sites that have an active domain), to exclude sites that do have an active domain.

For example:

SELECT s.id
     , s.name
     , s.company
     , s.association
     , s.is_supercharged
     , s.is_deleted
  FROM sites s
  JOIN ( SELECT d.site AS site_id
           FROM domain d
          GROUP BY d.site
       ) r
    ON r.site_id = s.id
  LEFT
  JOIN ( SELECT a.site AS site_id
           FROM domain a
          WHERE NOT a.is_deleted
          GROUP BY a.site
       ) q
    ON q.site_id = s.id
 WHERE q.site_id IS NULL
   AND NOT s.is_deleted
 ORDER BY s.id

This is just one of several different query patterns that will return the specified result.


We could also write query like this, to return a distinct list of site, where the "count" of domains is equal to the "count" of domains that are deleted

SELECT d.site
  FROM domain d
 GROUP BY d.site
HAVING SUM(IF(d.is_deleted,1,0)) = SUM(1)

With that, we could just use an inner join to site...

SELECT s.id
     , s.name
     , s.company
     , s.association
     , s.is_supercharged
     , s.is_deleted
  FROM sites s
  JOIN ( SELECT d.site AS site_id
           FROM domain d
          GROUP BY d.site
         HAVING SUM(IF(d.is_deleted,1,0)) = SUM(1)
       ) q
    ON q.site_id = s.id
 WHERE NOT s.is_deleted
 ORDER BY s.id

There are several other patterns that return an equivalent result.


Q: How can I use COUNT() to count the number of domains per site?

To get a count of domains per site for sites that have at least one domain, without regard to whether the site itself is_deleted:

SELECT d.site
     , COUNT(1) AS count_domains
  FROM domain d
 GROUP BY d.site

Similarly, to get a count of "is_deleted" domains per site (for sites that have at least one domain)

SELECT d.site
     , COUNT(IF(d.is_deleted,1,NULL)) AS count_deleted_domains
  FROM domain d
 GROUP BY d.site

We can combine those queries, and return both counts on a row for a site.

Those queries omit "counts" of zero. To get the zero counts, and include only sites that are not is_deleted:

SELECT s.id                           AS `site`
     , COUNT(d.site)                  AS `count_domains`
     , COUNT(IF(d.is_deleted,1,NULL)) AS `count_deleted_domains`
  FROM site s
  LEFT
  JOIN domain d
    ON d.site = s.id
 WHERE NOT s.is_deleted
 GROUP BY s.id
 ORDER BY s.id

We can also reference the results from the COUNT() aggregates in a HAVING clause.

This query will return all sites that don't have any "active" domain, including the sites that don't have any related domain.

SELECT s.id                           AS `site`
     , COUNT(d.site)                  AS `count_domains`
     , COUNT(IF(d.is_deleted,1,NULL)) AS `count_deleted_domains`
  FROM site s
  LEFT
  JOIN domain d
    ON d.site = s.id
 WHERE NOT s.is_deleted
 GROUP BY s.id
HAVING COUNT(d.site) = COUNT(IF(d.is_deleted,1,NULL))
 ORDER BY s.id

We could easily modify that to return the specified list of sites (add references to columns from the site table into the SELECT list) and omit the COUNT() expressions from the SELECT list.

If we only want sites that have at least one domain, we can simply remove the LEFT keyword, to make it an inner join rather than an outer join.

Comments