ThePerplexedOne ThePerplexedOne - 15 days ago 5
MySQL Question

How can I improve my select query to make it faster?

I have a table of employees (roughly 4.5 million) which have columns called

job_title
and
domain
.

I wish to be able to create and run dynamic queries to select specific employees based on the job title and whose domain is also within the array of domains provided.

How it works, is a user on the front-end ends up with a big array of domains (anywhere from 200 - 10,000+), they then enter two things:


  1. Which job titles they want included

  2. Which job titles they want excluded



So we then build a query that ends up looking like this:

SELECT employee_id
FROM employee
WHERE (
domain LIKE '%shetlandfoods.co.uk'
OR domain LIKE '%example1.co.uk'
OR domain LIKE '%example2.co.uk'

-- About 50 additional domains in this list

OR domain LIKE '%example50.co.uk
)
AND (job_title LIKE '%Manager%' OR job_title LIKE '%Director%')
AND (job_title NOT LIKE '%Assistant%')


(Note that the list of domains is extremely long, and may include thousands of domains(!))

Now, a query like this takes around 230 seconds, and that's only with around 180 domains! Imagine using thousands; it'd take forever.

I was wondering if there's any way I can optimize/change this query to make it run a bit faster? Or is there anything I can do to the database?

Answer

Here are two options which come to mind (the second suggested by @paul above in his comment).

One is that you could pre process the data in the domain column such that it only has the exact domain and nothing else. This is a well-known problem and is relatively easy to handle in Java or JavaScript. If this were done, then you can put an index on the domain column and use a WHERE clause looking something like the following:

WHERE domain IN ('shetlandfoods.co.uk',
                 'alac.shetland.co.uk',
                 'malakofflimited.co.uk',
                 ...)

Another option might be to compare the reverse of the domain against the reverse of the terms you had in your original WHERE clause, e.g.

WHERE REVERSE(domain) LIKE 'ku.oc.sdoofdnaltehs%' OR
      REVERSE(domain) LIKE 'ku.oc.dnaltehs.cala%' OR
      REVERSE(domain) LIKE 'ku.oc.detimilffokalam%' OR
      ...

You could even store the reverse of the domain from your app/UI layer so that you don't have to force MySQL to compute the reverse for each term in the WHERE clause.

I might lean towards the first option, assuming you have the bandwidth to extract out the domains before hitting MySQL.