Eric Laboy Eric Laboy - 6 months ago 11
SQL Question

how to increase performance of summing a column

I am trying to tune my query but I can't go further. is there any chance to tune more this query? especially SUM sub query.

INDEX : db_prices.date

Example of original query:

SELECT
db_villas.id,
db_villas.title1,
db_specials.id AS sid,
db_specials.title1 AS stitle,
db_cities.name AS cityName,
db_counties.name AS countyName,
db_assets.path,
db_villas.bathroom,
db_villas.bedroom,
db_villas.guest,
db_prices.date,
(SELECT SUM(db_prices.price) FROM db_prices WHERE db_prices.date BETWEEN "2016-08-01" AND "2016-09-30" AND db_prices.villaId=db_villas.id) AS price
FROM
db_villas
INNER JOIN db_cities ON db_villas.cityId = db_cities.id
LEFT OUTER JOIN db_specials ON db_villas.specialId = db_specials.id
INNER JOIN db_counties ON db_counties.cityid = db_cities.id AND db_villas.countyId = db_counties.id
INNER JOIN db_assets ON db_assets.guid = db_villas.guid
INNER JOIN db_villafacilities ON db_villafacilities.villaId = db_villas.id
INNER JOIN db_prices ON db_prices.villaId = db_villas.id
WHERE db_prices.date BETWEEN "2016-08-01" AND "2016-09-30" AND db_assets.isMainImage=1 AND db_villas.minRent <= 7
GROUP BY db_villas.id
HAVING (SELECT COUNT(*) FROM db_prices WHERE date BETWEEN "2016-08-01" AND "2016-09-30" AND db_prices.isFree = 0 AND db_prices.villaId = db_villas.id)=0


the query at above executed in 1.2 seconds.

When I remove

(SELECT SUM(db_prices.price) FROM db_prices WHERE db_prices.date BETWEEN "2016-08-01" AND "2016-09-30" AND db_prices.villaId=db_villas.id) AS price


sub query execution time reduced up to 0.009 seconds.

If I just remove this part

AND db_prices.villaId=db_villas.id


from the sub query it's still executed in 0.009 seconds.

Answer

MySQL (as of v 5.7) has a query planner without the chops to transform your dependent subquery

 (SELECT SUM(db_prices.price) 
    FROM db_prices
   WHERE db_prices.date BETWEEN "2016-08-01" AND "2016-09-30"
     AND db_prices.villaId=db_villas.id) AS price

into a joinable independent subquery. That means the query planner ends up running that query many times, using up time. So you need to do it yourself. The independent subquery will look like this:

                 SELECT villaId, 
                        SUM(price) price,  
                        SUM(CASE WHEN isFree = 0 THEN 1 ELSE 0 END) not_free_count
                   FROM db_prices
                  WHERE date BETWEEN  '2016-08-01' AND '2016-09-30'
                  GROUP BY villaId

This query gives you the sum of prices, and the count of villas not tagged isFree, for each villa. This is handy, because you can now JOIN this to the rest of your table. Like so:

 SELECT db_villas.id,
        db_villas.title1, etc etc,
        price_summary.price
   FROM db_villas
  INNER JOIN db_cities ON db_villas.cityId = db_cities.id
   LEFT OUTER JOIN db_specials ON db_villas.specialId = db_specials.id
           etc etc.
   LEFT JOIN (
                 SELECT villaId, 
                        SUM(price) price,  
                        SUM(CASE WHEN isFree = 0 THEN 1 ELSE 0 END) not_free_count
                   FROM db_prices
                  WHERE date BETWEEN  '2016-08-01' AND '2016-09-30'
                  GROUP BY villaId
       ) price_summary ON db_villas.villaId = price_summmary.villaId
 WHERE db_prices.date BETWEEN "2016-08-01" AND "2016-09-30"
   AND etc etc
   AND price_summary.not_free_count = 0
 GROUP BY db_villas.villaId

Then, you need the compound index on db_prices (date, villaId, price, isFree) to optimize your subquery. You may also need indexes on some other columns of other tables.

Pro tip: Lots of single-column indexes are no substitute for compound indexes in speeding up queries. Indexing lots of columns individually is a common, and notorious, antipattern. Read this: http://use-the-index-luke.com/

Pro tip: Your query is using the non-standard MySQL extension to GROUP BY. In a version of MySQL you may have soon, this will stop working unless you change some server settings. Read this: https://dev.mysql.com/doc/refman/5.7/en/group-by-handling.html