jmelvin jmelvin - 7 months ago 6
SQL Question

Consolidate SQL query with 2 subselects with same where clause

My app uses a scores table with a locationId, scoreDateTime, score, and comment columns. Users can score a location and optionally submit comments. A small data set might look like the following:

mysql> select locationId, scoreDateTime, score, comments from scores;
+-----------------------------+-------------------------+-------+--------------------------------+
| locationId | scoreDateTime | score | comments |
+-----------------------------+-------------------------+-------+--------------------------------+
| ChIJqZyf8O8F44kRbNWHQkDkpGQ | 2016-04-17 17:30:32.899 | 3 | asdfasf |
| ChIJqZyf8O8F44kRbNWHQkDkpGQ | 2016-04-17 18:28:46.221 | 3 | |
| ChIJqZyf8O8F44kRbNWHQkDkpGQ | 2016-04-17 18:29:56.395 | 3 | safasf |
| ChIJqZyf8O8F44kRbNWHQkDkpGQ | 2016-04-17 18:32:10.358 | 3 | |
| ChIJqZyf8O8F44kRbNWHQkDkpGQ | 2016-04-17 18:49:32.262 | 3 | |
| ChIJqZyf8O8F44kRbNWHQkDkpGQ | 2016-04-17 18:50:33.693 | 3 | |
| ChIJqZyf8O8F44kRbNWHQkDkpGQ | 2016-04-17 19:13:58.456 | 3 | |
| ChIJqZyf8O8F44kRbNWHQkDkpGQ | 2016-04-17 19:28:10.435 | 3 | asdfasf |
| ChIJqZyf8O8F44kRhatfHL4GYe0 | 2016-04-17 23:20:28.857 | 3 | aasdfasfsfsd |
| ChIJqZyf8O8F44kRhatfHL4GYe0 | 2016-04-17 23:22:55.254 | 3 | asdfasfasfsafasfsfasf asdfasfd |
| ChIJqZyf8O8F44kRhatfHL4GYe0 | 2016-04-17 23:40:37.106 | 3 | |
| ChIJpbSR1a4I44kRemEzTpniis8 | 2016-04-19 11:17:41.836 | 5 | adfgadf |
| ChIJF1LAoqgI44kR5EWvRqJPUN4 | 2016-04-19 11:17:52.536 | 4 | |
+-----------------------------+-------------------------+-------+--------------------------------+


I'd like to build a single query that will get the following for each location:


  • a score count from the last X hours

  • a comment count from the last Y days

  • the latest scoreDateTime (or NULL) for any comments in the last Y days



My motivation is to show locations, their recent score counts, their historical comment counts, and their latest comment datetime (or null). This will give me the recent running score counts and the hotness of the comment trail.

The following query works. However, the duplicate locationId list is actually going to be much higher in production. QUESTION: I'd like to know if there is a performant way to consolidate the 2 locationId lists, a.k.a 'locationId in (...)'.

select
x.locationId, count1, count2, count3, count4, count5, IFNULL(commentCount,0) as commentCount, lastCommentDateTime
from

( select
locationId,
sum(if (score = 1, 1, 0)) count1,
sum(if (score = 2, 1, 0)) count2,
sum(if (score = 3, 1, 0)) count3,
sum(if (score = 4, 1, 0)) count4,
sum(if (score = 5, 1, 0)) count5
from
scores
where
scoreDateTime > '2016-04-16 21:38:51.843' and
locationId in (
'ChIJqZyf8O8F44kRbNWHQkDkpGQ',
'ChIJqZyf8O8F44kRhatfHL4GYe0',
'ChIJCes00a4I44kRKG8zB4KvYTM',
'ChIJP-eRLq8I44kRKU6VOpTXqTM',
'ChIJpbSR1a4I44kRemEzTpniis8',
'ChIJF1LAoqgI44kRip2l7rjO2g4',
'ChIJF1LAoqgI44kR5EWvRqJPUN4',
'ChIJF1LAoqgI44kRRD_ZvPUmrGA',
'ChIJjweq4h0G44kRWoCPQKPdrPM',
'ChIJf2tVDB4G44kRTYjhl3sjm8M',
'ChIJ_Vg4giEG44kRq2nvtjEn8yA',
'ChIJP00qFSMG44kRyKcy2f_S12o'
)
group by locationId
) as x

left join

( select
locationId,
count(comments) as commentCount,
max(scoreDateTime) as lastCommentDateTime
from
scores
where
comments != "" and
scoreDateTime > '2016-01-16 00:00:00.000' and
locationId in (
'ChIJqZyf8O8F44kRbNWHQkDkpGQ',
'ChIJqZyf8O8F44kRhatfHL4GYe0',
'ChIJCes00a4I44kRKG8zB4KvYTM',
'ChIJP-eRLq8I44kRKU6VOpTXqTM',
'ChIJpbSR1a4I44kRemEzTpniis8',
'ChIJF1LAoqgI44kRip2l7rjO2g4',
'ChIJF1LAoqgI44kR5EWvRqJPUN4',
'ChIJF1LAoqgI44kRRD_ZvPUmrGA',
'ChIJjweq4h0G44kRWoCPQKPdrPM',
'ChIJf2tVDB4G44kRTYjhl3sjm8M',
'ChIJ_Vg4giEG44kRq2nvtjEn8yA',
'ChIJP00qFSMG44kRyKcy2f_S12o'
)
group by locationId
) as y

on x.locationId = y.locationId;


The results look like the following:

mysql> source ../../query3.sql
+-----------------------------+--------+--------+--------+--------+--------+--------------+-------------------------+
| locationId | count1 | count2 | count3 | count4 | count5 | commentCount | lastCommentDateTime |
+-----------------------------+--------+--------+--------+--------+--------+--------------+-------------------------+
| ChIJF1LAoqgI44kR5EWvRqJPUN4 | 0 | 0 | 0 | 1 | 0 | 0 | NULL |
| ChIJpbSR1a4I44kRemEzTpniis8 | 0 | 0 | 0 | 0 | 1 | 1 | 2016-04-19 11:17:41.836 |
| ChIJqZyf8O8F44kRbNWHQkDkpGQ | 0 | 0 | 8 | 0 | 0 | 3 | 2016-04-17 19:28:10.435 |
| ChIJqZyf8O8F44kRhatfHL4GYe0 | 0 | 0 | 3 | 0 | 0 | 2 | 2016-04-17 23:22:55.254 |
+-----------------------------+--------+--------+--------+--------+--------+--------------+-------------------------+

Answer

It looks like the difference between your 2 queries are the scoreDateTime and comments criteria. One way to combine your queries is by moving these conditions to your select using conditional aggregation.

Also, mysql evaluates booleans to 1 or 0, so you can simplify your sum calls by removing your if statements.

select
    locationId,
    sum(score = 1 and scoreDateTime > '2016-04-16 21:38:51.843') count1,
    sum(score = 2 and scoreDateTime > '2016-04-16 21:38:51.843') count2,
    sum(score = 3 and scoreDateTime > '2016-04-16 21:38:51.843') count3,
    sum(score = 4 and scoreDateTime > '2016-04-16 21:38:51.843') count4,
    sum(score = 5 and scoreDateTime > '2016-04-16 21:38:51.843') count5,
    sum(comments != "") commentCount,
    max(case when comments != "" then scoreDateTime end) as lastCommentDateTime
    from
    scores
    where
    scoreDateTime > '2016-01-16 00:00:00.000' and
    locationId in (
        'ChIJqZyf8O8F44kRbNWHQkDkpGQ',
        'ChIJqZyf8O8F44kRhatfHL4GYe0',
        'ChIJCes00a4I44kRKG8zB4KvYTM',
        'ChIJP-eRLq8I44kRKU6VOpTXqTM',
        'ChIJpbSR1a4I44kRemEzTpniis8',
        'ChIJF1LAoqgI44kRip2l7rjO2g4',
        'ChIJF1LAoqgI44kR5EWvRqJPUN4',
        'ChIJF1LAoqgI44kRRD_ZvPUmrGA',
        'ChIJjweq4h0G44kRWoCPQKPdrPM',
        'ChIJf2tVDB4G44kRTYjhl3sjm8M',
        'ChIJ_Vg4giEG44kRq2nvtjEn8yA',
        'ChIJP00qFSMG44kRyKcy2f_S12o'
    )
    group by locationId

This query can take advantage of a composite index on (locationId, scoreDateTime)