user884248 user884248 - 6 months ago 11
MySQL Question

mysql: sub-query makes global query much slower than just using the sub-query's actual results

I have the following tables in a mysql database:


  1. articles - has a numerical AUTO_INCREMENT id field, and some text fields using fulltext indexes

  2. categories - has a numerical AUTO_INCREMENT id field, and some other fields that are not relevant

  3. articles_categories - connects articles to categoties using two foreign keys: article_id (pointing to the id field of articles), and category_id (pointing to the id field of categories)



Now, I'm running the following query:

select AC.category_id, count(*) as num_articles
from categories C
join articles_categories AC on C.id = AC.category_id
where AC.article_id in (
select id from articles where match(name, teaser, plaintext_contents) against ('some text' in boolean mode)
)
group by AC.category_id;


, and it's taking almost 3 seconds to run with several hunders of fields in each table. Here are the results of running this query with EXPLAIN:

1 SIMPLE C index PRIMARY PRIMARY 4 187 Using index; Using temporary; Using filesort
1 SIMPLE AC ref IX_articles_categories_category_id_article_id IX_articles_categories_category_id_article_id 4 kalir_co_il.C.id 56 Using where; Using index
1 SIMPLE articles eq_ref PRIMARY PRIMARY 4 kalir_co_il.AC.article_id 1 Using index condition; Using where


But here's the oddity.

The following subquery running in the middle of my query only takes 0.4 seconds:

select id from articles where match(name, teaser, plaintext_contents) against ('some text' in boolean mode)


so I'm thinking the rest of the query is what takes so long. But... when I replace the sub-query with its actual results, which are a list of 27 ids (numbers), I run the entire query, and it executes in less than 0.1 seconds!

select AC.category_id, count(*) as num_articles
from categories C
join articles_categories AC on C.id = AC.category_id
where AC.article_id in (61,
167,
352,
374,
407,
419,
455,
462,
467,
502,
575,
696,
746,
756,
759,
851,
1001,
1046,
1052,
1311,
1497,
1501,
1502,
1503,
1508,
1514,
1538
)
group by AC.category_id;


with the following results from EXPLAIN:

1 SIMPLE C index PRIMARY PRIMARY 4 187 Using index; Using temporary; Using filesort
1 SIMPLE AC ref IX_articles_categories_category_id_article_id IX_articles_categories_category_id_article_id 4 kalir_co_il.C.id 56 Using where; Using index


How does this make sense?

Note: I'm hoping to avoid posting actual table and index code, but I'll do it if I have to. Thanks to anyone willing to help...

Answer

Mysql's has a history of not handling subqueries amazingly well. There are a lot of intricacies that can play a role in this depending on your mysql version, flags etc. Take a look at this document for details - https://dev.mysql.com/doc/refman/5.7/en/subquery-optimization.html

Also, have you tried rewriting your query as a join instead?

select AC.category_id, count(*) as num_articles
from categories C
join articles_categories AC on C.id = AC.category_id
join (
  select id from articles where match(name, teaser, plaintext_contents) against ('some text' in boolean mode)
) A_IDS ON A_IDS.id=AC.article_id

group by AC.category_id;

What does the performance look like then?

Comments