Matthew Lau Matthew Lau - 6 months ago 9
SQL Question

How to optimize this simple but slow query?

I have a traveler table with around 200,000 records containing the user number (pk), country, and time of departure (datetime). My goal is to calculate the number of travelers that departed within one hour of each traveler's departure time for the same country.

So my input look like

| Travel_no | Date_time | Country|
|-----------+------------------------|--------|
| 1 | 20160401150200 | 1 |
| 2 | 20160401160000 | 2 |
| 3 | 20160401010501 | 3 |
| 4 | 20160401090700 | 2 |
| 5 | 20160401155800 | 1 |


and my goal is to get

| Travel_no | Date_time | Country| country_within_hr_cnt|
|-----------+------------------------|--------|----------------------|
| 1 | 20160401150200 | 1 | 2 |
| 2 | 20160401160000 | 2 | 1 |
| 3 | 20160401010501 | 3 | 1 |
| 4 | 20160401090700 | 2 | 1 |
| 5 | 20160401155800 | 1 | 2 |


Right now I am using this query and its taking forever to run it ...

Select
Travel_no
,Date_time
,Country
,(Select Count(Travel_no) from #temp1 b
where
CAST(b.Date_time AS BIGINT) >= CAST(a.Date_time AS BIGINT) - 10000
and CAST(b.Date_time AS BIGINT) <= CAST(a.Date_time AS BIGINT) + 10000
and a.Country = b.Country
) 'country_within_hr_cnt'
FROM #temp1 a
GROUP BY
Travel_no , Date_time, Country


Do you guys know if there is a way to make this run faster? Would an index on Date_time work well?

Answer

Think of it this way: for each row in the grouped result of a, it has to scan the entire b table, because to evaluate the conditions, it must compute the cast. It will be fast when you arrange it so b is indexed on the column you're accessing in your condition, and your condition references the b column unadorned - i.e. the column values just as they're stored in b. You want your query to look like this:

Select 
Travel_no
,Date_time
,Country
,(Select Count(Travel_no) from #temp1 b
  where 
  b.Date_time >= a.Date_time - 10000 
  and b.Date_time <= a.Date_time + 10000 
  and a.Country = b.Country
  ) 'country_within_hr_cnt'
FROM #temp1 a
GROUP BY 
Travel_no , Date_time, Country

but even this might work

Select 
Travel_no
,Date_time
,Country
,(Select Count(Travel_no) from #temp1 b
  where 
  b.Date_time >= CAST(varchar(20),CAST(a.Date_time AS BIGINT) - 10000)
  and b.Date_time <= CAST(varchar(20),CAST(a.Date_time AS BIGINT) + 10000) 
  and a.Country = b.Country
  ) 'country_within_hr_cnt'
FROM #temp1 a
GROUP BY 
Travel_no , Date_time, Country

This assumes you've indexed b on Country and Date_time.