LittleNooby LittleNooby - 6 months ago 13
SQL Question

a count for each join - optimisation

RESULTS :
I've used three methods :


  1. Three sub queries, 1 join in each (mine)

  2. Three sub queries, no join, filtering with where (SlimsGhost)

  3. Triple join (Solarflare)



I've made some stats with "explain" and "profiling" which explains the work each query must do and the following results weren't surprising : stats

Relative results :


  1. 100%

  2. 79%

  3. 1715%

    three sub queries with simple join
    three sub queries with where clause
    one query with triple join



ORIGINAL POST

The idea is to join 4 tables, using the same PK each time and then count how many rows each join would separately gives.

The obvious answer is to do each join... separately with sub queries.

But is it possible to do it with one query? Would it be more efficient?

select "LES CIGARES DU PHARAON" as "Titre",
(select count( payalb.idPays)
from album alb
left join pays_album payalb using ( idAlb )
where alb.titreAlb = "LES CIGARES DU PHARAON") as "Pays",
(select count( peralb.idPers)
from album alb
left join pers_album peralb using ( idAlb )
where alb.titreAlb = "LES CIGARES DU PHARAON") as "Personnages",
(select count( juralb.idJur)
from album alb
left join juron_album juralb using ( idAlb )
where alb.titreAlb = "LES CIGARES DU PHARAON") as "Jurons"
;
+------------------------+------+-------------+--------+
| Titre | Pays | Personnages | Jurons |
+------------------------+------+-------------+--------+
| LES CIGARES DU PHARAON | 3 | 13 | 50 |
+------------------------+------+-------------+--------+


table album rows : 22

table pays_album rows : 45

table personnage_album rows : 100

table juron_album rows : 1704

Here is what I tried :

select alb.titreAlb as "Titre",
sum(case when alb.idAlb=payalb.idAlb then 1 else 0 end) "Pays",
sum(case when alb.idAlb=peralb.idAlb then 1 else 0 end) "Personnages",
sum(case when alb.idAlb=juralb.idAlb then 1 else 0 end) "Jurons"
from album alb
left join pays_album payalb using ( idAlb )
left join pers_album peralb using ( idAlb )
left join juron_album juralb using ( idAlb )
where alb.titreAlb = "LES CIGARES DU PHARAON"
group by alb.titreAlb
;
+------------------------+------+-------------+--------+
| Titre | Pays | Personnages | Jurons |
+------------------------+------+-------------+--------+
| LES CIGARES DU PHARAON | 1950 | 1950 | 1950 |
+------------------------+------+-------------+--------+


but it counts the total number of rows of the full joined table, ... (1950 = 3 * 13 * 50)

schema : https://github.com/LittleNooby/gbd2015-2016/blob/master/tintin_schema.svg

tables content : https://github.com/LittleNooby/gbd2015-2016/blob/master/tables_description

If you want to play to play with it :

db_init : https://github.com/LittleNooby/gbd2015-2016/blob/master/tintin_ok.mysql

Answer

For optimization purposes, a good rule of thumb is to join less, not more. In fact, you should try to join as few rows as you can with as few rows as you can. With any additional join, you will multiply cost instead of adding cost. Because mysql will basically just generate a big multiplied matrix. A lot of that gets optimized away by indexes and other stuff though.

But to answer your question: it is actually possible to count with only one big join, assuming the tables have unique keys and idalb is a unique key for album. Then, and only then, you can do it similar to your code:

select alb.titreAlb as "Titre",
       count(distinct payalb.idAlb, payalb.PrimaryKeyFields) "Pays",
       count(distinct peralb.idAlb, peralb.PrimaryKeyFields) "Personnages",
       count(distinct juralb.idAlb, juralb.PrimaryKeyFields) "Jurons"
from album alb
left join pays_album payalb using ( idAlb )
left join pers_album peralb using ( idAlb )
left join juron_album juralb using ( idAlb )
where alb.titreAlb = "LES CIGARES DU PHARAON"
group by alb.titreAlb

where PrimaryKeyFields stands for the primary key fields of the joined tables (you have to look them up).

Distinct will remove the effect the other joins have on the count. But unfortunately, in general, distinct will not remove the effect the joins have on the cost.

Although, if you have indexes that cover all (idAlb + PrimaryKeyFields)-fields of your tables, that might be even as fast as the original solution (because it can optimize the distinct to not do a sorting) and will come close to what you were thinking of (just walking through every table/index once). But in a normal or worst case szenario, it should perform worse than a reasonable solution (like SlimGhost's one) - because it is doubtful it will find the optimal strategy. But play around with it and check the explains (and post the findings), maybe mysql will do something crazy.