Mathias Nervik Mathias Nervik - 2 months ago 7
PHP Question

Speed up a slow SQL query on a large database

This is a picture-type webpage



The code should select the next tag with the least amount of unrated images.

Table info



table_name = images




  • filename (varchar255)

  • pick (tinyint)

  • reject (tinyint)



table_name = image_tags




  • filename (varchar255)

  • tag_name (varchar255)



Indexes



The flowing columns has indexes:


  • images.filename

  • images.pick

  • images.reject

  • image_tags.filename



Functionality



The code should go through every row and check how many duplicate tag_name-column there are. Afterwards it should return the tag with the least number of duplicate rows. The code should ignore any rows where
pick = TRUE
or
reject = TRUE


Meaning the code should only include files if
pick = FALSE
and
reject = FALSE


Code



At the moment my code is the folowing.

SELECT image_tags.tag_name, COUNT(*) as number_of_rows FROM image_tags JOIN images ON image_tags.filename = images.filename WHERE images.pick = FALSE AND images.reject = FALSE GROUP BY image_tags.tag_name ORDER BY number_of_rows


Question



Because of the size of the database the query takes a lot of time (sometimes as much as 30 seconds). Is there a way to make the query run faster, or is there another query that would have less execution time?

Size




  • images-table: 700740 rows

  • image_tags-table: 823852 rows


Answer

This is your query:

SELECT it.tag_name, COUNT(*) as number_of_rows
FROM image_tags it JOIN
     images i
     ON it.filename = i.filename
WHERE i.pick = FALSE AND i.reject = FALSE
GROUP BY it.tag_name
ORDER BY number_of_rows;

The first thing to try is an index on images(pick, reject, filename). This probably won't help, because boolean columns don't usually whittle down the data enough. But it is worth a try.

The second thing to try is this query:

SELECT it.tag_name,
       (SELECT COUNT(*) 
        FROM images i
        WHERE it.filename = i.filename AND
              i.pick = FALSE AND
              i.reject = FALSE
       ) as number_of_rows
FROM image_tags it 
ORDER BY number_of_rows;

This moves the aggregation to a subquery, which sometimes performs better in MySQL. For this, you want an index on images(filename, pick, reject). Also note that this returns all tags, not just those that have a matching image. You can filter using HAVING number_of_rows > 0 if you want the same results as your version.