Desmond Desmond - 10 days ago 6
SQL Question

How to handle large data query on mysql with better performances

At the moment, I am trying to run a specific query via MySQL workbench from three different table in 2 different DB schema and I can't achieve it.

I currently have a

track
and
completed
table in a database schema and a
location
table database in a different server.


  1. The
    track
    table collects the start and end of a todo.

  2. The
    completed
    table save the results

  3. The
    location
    database is used to get the location of creation and completion of the todo.



track



+----+----------+---------------+---------------------+
| tid | user_id | function_name | track_time |
+-----+---------+---------------+---------------------+
| 1 | des | create | 2015-02-29 1 pm |
| 2 | des | complete | 2015-02-29 2 pm |
| 3 | greg | create | 2015-02-29 3 pm |
| 4 | greg | complete | 2015-02-29 4 pm |
+-----+---------+---------------+---------------------+


completed



+-----+------+---------------------+
| tid | uid | insert_time |
+-----+------+---------------------+
| 1 | des | 2015-02-29 1 pm |
| 2 | des | 2015-02-29 2 pm |
| 3 | greg | 2015-02-29 3 pm |
| 4 | greg | 2015-02-29 4 pm |
+-----+------+---------------------+


location



+----+----------+---------------+----------+
| tid | user_id | action | location |
+-----+---------+---------------+----------+
| 1 | des | create | subways |
| 2 | des | complete | home |
| 3 | greg | create | home |
| 4 | greg | complete | market |
+-----+---------+---------------+----------+


I am able to get a join result from two table in the same db schema below:

Query result



+-----+---------+---------------+-----------------+-----+------+---------------+
| tid | user_id | function_name | track_time | tid | uid | insert_time |
+-----+---------+---------------+-----------------+-----+------+---------------+
| 2 | des | complete | 2015-02-29 1 pm | 2 | des | 15-02-29 2 pm |
| 4 | greg | complete | 2015-02-29 3 pm | 4 | greg | 15-02-29 4 pm |
+-----+---------+---------------+-----------------+-----+------+---------------+


select * from
svr1.tracking t,
svr1.completed c
where
t.user_id = c.uid
and t.tid = c.tid
and t.function_name = 'create'
and t.track_time > '2015-02-29 00:00:00'
and t.track_time < '2015-02-29 23:59:59'


However, I also need the
location
info in my query BUT the location table for one single day has 150 million records and it took forever to run as my mac's 16gb ram runs out even they are indexed.

I required the output to have

user_id,
create tid,
function_name,
track_time,
create location,
complete tid,
function_name,
track_time,
location


which will give me an output like below:

des, 1, create, 2015-02-29 1 pm, subways, 2,complete, 2015-02-29 2 pm, home


This is the combine track info and results
enter image description here

the user info

enter image description here

in location the ID is hashed mac address of user device, i wanted to find out the track time and the location recorded time for each particular user.
enter image description here

I wonder whats the best script to write to achieve it as running with workbench doesnt work for me.

Thank you for reading and any comments are greatly appreciated!!

Answer

I eventually solved the issues through another question post: How to get latest results by date when selecting from two table?

By exporting the results to another server DB, i am able to get what i needed via the code below.

SELECT
r1.uid,r1.tid,r1.insert_time,l1.location,l1.time,timeDiff
FROM
(
select 
r.uid,
r.tid,
l.time,
l.location,
r.androidId,
r.insert_time,
min(abs(TIME_TO_SEC(TIMEDIFF(insert_time,l.time)))) as timeDiff
from
locationDB.track_result_submitted_mac r inner join
locationDB.location_archival_2015_09 l

on androidId = id
where

l.time > '2015-09-23 00:00:00' and l.time < '2015-09-30 23:59:59'  and
r.insert_time > '2015-09-23 00:00:00' and r.insert_time < '2015-09-30 23:59:59'

and abs(TIME_TO_SEC(TIMEDIFF(insert_time,l.time))) < 18000 
group by uid,tid

) as t,

indoorloc.track_result_submitted_mac r1 inner join
indoorloc.location_archival_2015_09 l1
on r1.androidId = l1.id

WHERE 
(abs(TIME_TO_SEC(TIMEDIFF(r1.insert_time,l1.time)))) = t.tim and
r1.uid = t.uid and
r1.tid = t.tid

group by uid, tid