Mathias Nervik Mathias Nervik - 1 year ago 61
PHP Question

Speed up a slow SQL query on a large database

This is a picture-type webpage

The code should select the next tag with the least amount of unrated images.

Table info

table_name = images

  • filename (varchar255)

  • pick (tinyint)

  • reject (tinyint)

table_name = image_tags

  • filename (varchar255)

  • tag_name (varchar255)


The flowing columns has indexes:

  • images.filename

  • images.pick

  • images.reject

  • image_tags.filename


The code should go through every row and check how many duplicate tag_name-column there are. Afterwards it should return the tag with the least number of duplicate rows. The code should ignore any rows where
pick = TRUE
reject = TRUE

Meaning the code should only include files if
pick = FALSE
reject = FALSE


At the moment my code is the folowing.

SELECT image_tags.tag_name, COUNT(*) as number_of_rows FROM image_tags JOIN images ON image_tags.filename = images.filename WHERE images.pick = FALSE AND images.reject = FALSE GROUP BY image_tags.tag_name ORDER BY number_of_rows


Because of the size of the database the query takes a lot of time (sometimes as much as 30 seconds). Is there a way to make the query run faster, or is there another query that would have less execution time?


  • images-table: 700740 rows

  • image_tags-table: 823852 rows

Answer Source

This is your query:

SELECT it.tag_name, COUNT(*) as number_of_rows
FROM image_tags it JOIN
     images i
     ON it.filename = i.filename
WHERE i.pick = FALSE AND i.reject = FALSE
GROUP BY it.tag_name
ORDER BY number_of_rows;

The first thing to try is an index on images(pick, reject, filename). This probably won't help, because boolean columns don't usually whittle down the data enough. But it is worth a try.

The second thing to try is this query:

SELECT it.tag_name,
       (SELECT COUNT(*) 
        FROM images i
        WHERE it.filename = i.filename AND
              i.pick = FALSE AND
              i.reject = FALSE
       ) as number_of_rows
FROM image_tags it 
ORDER BY number_of_rows;

This moves the aggregation to a subquery, which sometimes performs better in MySQL. For this, you want an index on images(filename, pick, reject). Also note that this returns all tags, not just those that have a matching image. You can filter using HAVING number_of_rows > 0 if you want the same results as your version.