Mark Mark - 25 days ago 6
SQL Question

Simple Select Group By using filesort & tempory, not index

So I have combed the web and can't seem to find an answer. I have a table with the following structure

Table structure for table `search_tags`
--

CREATE TABLE IF NOT EXISTS `search_tags` (
`ID` int(11) NOT NULL AUTO_INCREMENT,
`LOOK_UP_TO_CAT_ID` int(11) NOT NULL,
`SEARCH_TAG` text COLLATE utf8_unicode_520_ci NOT NULL,
`DATE` datetime NOT NULL DEFAULT CURRENT_TIMESTAMP,
`SOURCE` varchar(225) COLLATE utf8_unicode_520_ci NOT NULL,
`SOURCE_ID` int(11) NOT NULL,
`WEIGHT` int(11) NOT NULL DEFAULT '1000',
PRIMARY KEY (`ID`),
KEY `LOOK_UP_TO_CAT_ID` (`LOOK_UP_TO_CAT_ID`),
KEY `WEIGHT` (`WEIGHT`),
FULLTEXT KEY `SEARCH_TAG` (`SEARCH_TAG`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_520_ci AUTO_INCREMENT=1 ;


The table sits with 800000+ rows and is growing.

When I run a query with a group by on
LOOK_UP_TO_CAT_ID
it takes between 1-2 seconds for the query to run. I need to run multiple versions of this base with joins to other tables but this seems to be where the bottleneck lies as adding joins to this doesn't slow it down

SELECT LOOK_UP_TO_CAT_ID, WEIGHT
FROM `search_tags`
WHERE `SEARCH_TAG` LIKE '%metallica%'
GROUP BY `LOOK_UP_TO_CAT_ID`


Removing the
GROUP BY
drops the query time down to 0.1 which seems much more acceptable but then I land up with duplicates.

Using explain with the group by shows that it's creating a temporary table rather than using the index

+----+-------------+-------------+------+-------------------+------+---------+------+--------+----------------------------------------------+--+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | |
+----+-------------+-------------+------+-------------------+------+---------+------+--------+----------------------------------------------+--+
| 1 | SIMPLE | search_tags | ALL | LOOK_UP_TO_CAT_ID | NULL | NULL | NULL | 825087 | Using where; Using temporary; Using filesort | |
+----+-------------+-------------+------+-------------------+------+---------+------+--------+----------------------------------------------+--+


So I'm not sure if mysql is doing the right thing here or not, but to me at least it seems wrong not to use the index. What would be the best why to speed this query up?

Edit:

Heres an example of my data:

+----+-------------------+----------------------------------+------------+---------------+-----------+--------+
| ID | LOOK_UP_TO_CAT_ID | SEARCH_TAG | DATE | SOURCE | SOURCE_ID | WEIGHT |
+----+-------------------+----------------------------------+------------+---------------+-----------+--------+
| 1 | 521 | METALLICA | 2017-02-18 | artist | 15 | 1 |
| 2 | 521 | METALLICA - NOTHING ELSE MATTERS | 2017-02-18 | tracklisting | 22 | 2 |
| 3 | 522 | METALLICA | 2017-02-18 | artist | 15 | 1 |
| 4 | 522 | METALLICA - ST. Anger | 2017-02-18 | product_title | 522 | 2 |
+----+-------------------+----------------------------------+------------+---------------+-----------+--------+


Desired Result

+-------------------+--------+
| LOOK_UP_TO_CAT_ID | WEIGHT |
+-------------------+--------+
| 521 | 1 |
| 522 | 1 |
+-------------------+--------+

Answer Source

A few suggestions for you.

  1. SEARCH_TAG LIKE '%metallica%' will never, in this world of woe, use an index. The pattern haystack like '%needle' (leading %) requires MySQL to examine every value in the column for a match. haystack LIKE 'needle%' (trailing %)does not have this problem.

  2. You have a FULLTEXT index on your SEARCH_TAG column, so use it! WHERE MATCH('metallica') AGAINST SEARCH_TAG is the form of the WHERE clause you need.

  3. You have lots of single-column indexes on your table. These are generally unhelpful for making queries faster unless they happen to match exactly what you're trying to do. You'll be better off using compound covering indexes designed for the queries you're running.

The example query in your question is

   SELECT LOOK_UP_TO_CAT_ID, WEIGHT
    FROM  search_tags 
   WHERE  SEARCH_TAG LIKE  '%metallica%'
GROUP BY  LOOK_UP_TO_CAT_ID

If you change it to this it will make more SQL sense and run faster.

   SELECT  LOOK_UP_TO_CAT_ID, MAX(WEIGHT)
     FROM  search_tags 
    WHERE  SEARCH_TAG LIKE  'metallica%'
 GROUP BY  LOOK_UP_TO_CAT_ID

(Notice I got rid of the leading %.)

If you add a compound covering index on (SEARCH_TAG, LOOK_UP_CAT_ID, WEIGHT) this query will become quite fast. The entire query can be satisfied from the index. MySQL random-accesses the index to find your SEARCH_TAG, then does a loose index scan to get the results you requested.

( An aside: don't worry when you see filesort in EXPLAIN output in a GROUP BY or ORDER BY query. It's part of the way MySQL satisfies the query. The file in filesort doesn't necessarily mean a slow file on a hard drive. )