user3649739 user3649739 - 6 months ago 12
SQL Question

Optimizing Update Query with compound index

I tested an update between two large (~5 mil records each) which was taking 10 seconds or so per update. So, doing Explain for my very first time tested the select:

SELECT
T1.Z, T2.Z
FROM
TableB T1
INNER JOIN TableL T2
on T1.Name=T2.Name
and T1.C=T2.C
and T1.S=T2.S
and T1.Number>=T2.MinNumber
and T1.Number<=T2.MaxNumber


Explain returned the following as possible keys:


  • Name

  • C

  • S

  • Number



and chose C as the key.

I was told that my best bet was to make a compound key, and in the order of the select so I did

Alter Table TableB Add Index Compound (Name,C,S,Number)


And did an explain again, hoping it would choose my compound but now even though it shows the compound index as a possible key it still chooses Index C.

I read that I can force the index I want with:

SELECT
T1.Z, T2.Z
FROM TableB T1 Force Index(Compound)
INNER JOIN TableL T2
on T1.Name=T2.Name
and T1.C=T2.C
and T1.S=T2.S
and T1.Number>=T2.MinNumber
and T1.Number<=T2.MaxNumber


yet I am not sure if it makes any sense to over-ride MySql's selection and, given that if it doesn't help the update is going to take almost two years it doesn't seem like a smart thing to test.

Is there some step I am missin? Do I need to remove the other keys so that it chooses my compound one and if so how will I know if it will even make a difference (given that Mysql saw it and rejected it)?




Explain output on T1: (note: I did not yet add the Compound Index as the table is huge and it might be wasted time until I figure this out. I previously added it on a highly truncated version of the table but that won't help with this explain)
Table1


  • select_type: simple

  • type: ref

  • possible_keys:

  • Number,C,S,Name

  • key: Name

  • key_len: 303

  • ref: func

  • rows: 4

  • Extra: using where



Explain for Table2


  • select_type: SIMPLE

  • type: ALL

  • possible_Keys: MinNumber, MaxNumber

  • key:

  • key_length:

  • ref:

  • rows: 5,447,100

  • Extra:



Cardinality (only showing indexes relevant here as there are a few others):


  • Primary: 5139680

  • Name: 1284920

  • Number: 57749

  • C: 7002

  • S: 21


Answer

So based on some great comments/input I came up with a solution. One flashbulb input from Paul Spiegel was that trying to join two 5+mil tables using several VarChar fields was not recommended.

So what I did was create a UniqueTable with ID and UnqiueRecord Fields.

I then made the UniqueRecord a Unique Index.

I inserted into that table from Both TableA and TableB as:

Insert IGNORE into `Unique` (UniqueRecord) 
Select Concat(Name,C,S) from Table1 Group by Name,C,S;
Insert IGNORE into `Unique` (UniqueRecord) 
Select Concat(Name,C,S) from Table2 Group by Name,C,S

This gave me unique records from both within and between the two tables.

I then added a UniqeRecord_ID field to both Table1 and Table 2.

I then did a join between each table and the UniqueRecord to write the UniqueRecord ID to each table:

Update Table1 as T1
Inner Join Unique as T2
On Concat(T1.Name,T1.S,T1.C) = T2.UniqueRecord
Set T1.UniqueRecord_ID=T2.ID

Finally, I added a key to each table on UniqueRecord_ID.

My Explain showed that it only used that key from T2 however whereas it was taking 10 seconds per record for the select prior (I tested on 1,10,100 and stopped there as I did not have the requisite 578 days to test the whole table :| ) the entire select, returning close to 5 million records took 72 seconds.