user5307298 user5307298 - 4 months ago 9
SQL Question

Three table joins does not yield correct result upon GROUP BY

I have 3 tables viz. actions, sources, and visits with some sample data given below:

1. actions:

enter image description here

2. sources:

enter image description here

3. visits:

enter image description here

Now, we want to know the number of visits through each source type and moreover also want to check how many visits are via contact and sale separately. The query still being used is:

SELECT COUNT(v.visits_id) AS visits, s.source_type AS source,
SUM(a.action_contact) AS contact, SUM(a.action_sale) AS sale FROM visits AS v JOIN sources AS s ON v.source_id = s.source_id JOIN actions AS a ON s.source_id = a.source_id
GROUP BY source ORDER BY visits DESC


Output:

enter image description here

So if you look into the output then you will find the number of visits are exactly equal to the sum of contact and sale for each row. So it seems to be OK. But if you sum up all the visits for all rows then it is not equal to 31. It is 47 in numbers which is not possible because we had only 31 visits in our visits table. Hence the output must not give us larger number of visits than that of visits table. This is the problem I have been facing in the query. How to bring up the correct result that must be in accordance to original tables?

Updated table actions:

enter image description here

Answer

It's because you're joining a many-to-many with actions to visits via sources. Ideally you would have a visit associated to each action, but you could also use the following:

SELECT s.source_type AS source
, count(v.visits_id) as visits
, ifnull((SELECT SUM(a1.action_contact) FROM actions a1 WHERE s.source_id = a1.source_id and a1.action_contact=1),0) as contacts
, ifnull((SELECT SUM(a2.action_sale) FROM actions a2 WHERE s.source_id = a2.source_id and a2.action_sale=1),0) as sales
FROM visits v 
JOIN sources s ON v.source_id = s.source_id 
GROUP BY source 
ORDER BY visits DESC

This will return one record per source, with the visits counted as per normal, but with contacts and sales summed as sub-queries so they're not being multiplied by the number of visits with the same source. If you're not familiar with many-to-many joins and the effect they have on the number of records (and therefore count and sum functions) you can see some examples here: http://www.phpknowhow.com/mysql/many-to-many-relationships/

Comments