yoprogramo yoprogramo - 2 months ago 11
MySQL Question

Optimize mysql query involving millions of rows

I have, in a project, a database with two big tables, "terminosnoticia" have 400 Million rows and "noticia" 3 Million. I have one query I want to make lighter (it spend from 10s to 400s):

SELECT noticia_id, termino_id
FROM noticia
LEFT JOIN terminosnoticia on terminosnoticia.noticia_id=noticia.id AND termino_id IN (7818,12345)
WHERE noticia.fecha BETWEEN '2016-09-16 00:00' AND '2016-09-16 10:00'
AND noticia_id is not null AND termino_id is not null;`


The only viable solution I have to explore is to denormalize the database to include the 'fecha' field in the big table, but, this will multiply the index sizes.

Explain plan:

+----+-------------+-----------------+--------+-----------------------+------------+---------+-----------------------------------------+-------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-----------------+--------+-----------------------+------------+---------+-----------------------------------------+-------+-------------+
| 1 | SIMPLE | terminosnoticia | ref | noticia_id,termino_id | termino_id | 4 | const | 58480 | Using where |
| 1 | SIMPLE | noticia | eq_ref | PRIMARY,fecha | PRIMARY | 4 | db_resumenes.terminosnoticia.noticia_id | 1 | Using where |
+----+-------------+-----------------+--------+-----------------------+------------+---------+-----------------------------------------+-------+-------------+


Changing the query and creating the index as suggested, the explain plan is now:

+----+-------------+-------+--------+-------------------------------------------+---------------------+---------+---------------------------+-------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+--------+-------------------------------------------+---------------------+---------+---------------------------+-------+-------------+
| 1 | SIMPLE | T | ref | noticia_id,termino_id,terminosnoticia_cpx | terminosnoticia_cpx | 4 | const | 60600 | Using index |
| 1 | SIMPLE | N | eq_ref | PRIMARY,fecha | PRIMARY | 4 | db_resumenes.T.noticia_id | 1 | Using where |
+----+-------------+-------+--------+-------------------------------------------+---------------------+---------+---------------------------+-------+-------------+


But the execution time does not vary too much...

Any idea?

Answer

As Strawberry pointed out, by having an "AND" in your where clause for NOT NULL is the same as a regular INNER JOIN and can be reduced to.

 SELECT 
       N.id as noticia_id, 
       T.termino_id
   FROM 
      noticia N
         JOIN terminosnoticia T
            on N.id = T.noticia_id
            AND T.termino_id IN (7818,12345) 
   WHERE 
      N.fecha BETWEEN '2016-09-16 00:00' AND '2016-09-16 10:00' 

Now, that said and aliases applied, I would suggest the following covering indexes as

table           index
Noticia         ( fecha, id )
terminosnoticia ( noticia_id, termino_id )

This way the query can get all the results directly from the indexes and not have to go to the raw data pages to qualify the other fields.