John John - 1 month ago 7
MySQL Question

Mysql table performance problems on simple update/select (large table, many connections)

I've been playing with various approaches for about one week, always resulting in "crashing" my server through massive load during the test runs.

mysql> explain select id FROM task_jobs FORCE INDEX (index_update_get_work) WHERE customer_job_id=31 AND client_reserved=0 AND result_delivered=0 AND (assigned_instance is NULL) LIMIT 10;
+----+-------------+--------------------+------------+------+-----------------------+-----------------------+---------+-------------------------+--------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+--------------------+------------+------+-----------------------+-----------------------+---------+-------------------------+--------+----------+-------------+
| 1 | SIMPLE | task_jobs | NULL | ref | index_update_get_work | index_update_get_work | 14 | const,const,const,const | 104226 | 100.00 | Using where |
+----+-------------+--------------------+------------+------+-----------------------+-----------------------+---------+-------------------------+--------+----------+-------------+
1 row in set, 1 warning (0.00 sec)


To keep it generic: I have a table with millions of growing rows.

The table is providing work jobs to hundreds up to thousands of cloud instances at a time.

All those instances will query my table (up to 3000 queries at once) and ask to receive their work data.

There are a few hundred thousand of rows with "open jobs" but only 10-20 are handed out to one instance at a time.

My current approach which is most performant but still a big problem:


  1. I make an UPDATE on LIMIT 10 rows WHERE customer_job_id=31 AND client_reserved=0 AND result_delivered=0 AND (assigned_instance is NULL)

    I guess the query is self explaining, it looks for unassigned jobs which have not delivered a result yet from a specific "job id".

    The query looks like the one in the beginning just UPDATE instead of SELECT.

  2. now program logic chooses some of the delievered rows and makes a second update to finally assign them to the instance using WHERE id IN (x,x,x,x,x)



I am using this approach so I can fast "lock" 10 rows by updating them to be busy so the next instance can also lock another 10 rows, and so on.

This works fine and without any issues for 100 instances at a time, if I drive up the load to 500 instances the server gets locked up.
It fills the database connections with hundreds of LOCKed requests to update 10 rows taking 15 seconds eat (it was at 140 sec before optimization).

As you can see in the beginning, the SELECT (in reality it's an UPDATE SET client_reserved = 1, assigned_instance=$instance_id ) has to go through 100k rows (possibly more).
Even if it just chooses 10 of them, it seems to examine every single job before it finishes and updates the first 10. At least EXPLAIN seems to tell that.

So basically my question is to find a better approach.

I need to grab out thousands of rows within a few seconds from thousands of distinct connections.

Every time I need to get a small number of rows out of the 100-500k available jobs/rows "WHERE customer_job_id=31 AND client_reserved=0 AND result_delivered=0 AND (assigned_instance is NULL)".

assigned_instance is a varchar (with index(1) for the NULL condition) the others are tinyint(1). I made an index combining all of them but it didn't really help.

Update

For clarification:

I am using the "UPDATE" because the API on the main server does not know if there are other simultanous requests "give me work".

So I used UPDATE on a number of rows to "reserve" them for the current instance.

As UPDATE is an "atomic" operation in SQL there is no risk that another request is served with the same jobs (race condition).

Update question

A general question: I use LIMIT 10

Why does it search through 100,000 results if 10 are enough ?
It makes no performance difference when I add ORDER BY RAND() where it really has to look through all 100k results and reorder them (same performance cost).
Why doesn't mysql just stop once it found 10 hits (that's what I hoped for with LIMIT 10 and by omitting any ORDER BY clauses)

Answer

Ok, so what you need is a task queue, which would store references to available jobs, which could be "popped" from the queue

CREATE TABLE task_queue (
    id INT UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY,
    task_job_id INT UNSIGNED NOT NULL
) ENGINE=InnoDB;

You could VERY quickly "pop" x items from the table with something like this:

LOCK TABLES task_queue READ;
SELECT * FROM task_queue LIMIT x;
DELETE FROM task_queue LIMIT x;
UNLOCK TABLES;

You could continue to write onto the end of the queue, with a cron running every minute to retrieve and queue new tasks matching your criteria:

SELECT id
FROM task_jobs
FORCE INDEX (index_update_get_work) 
WHERE customer_job_id = 31 
    AND client_reserved = 0  
    AND result_delivered = 0 
    AND assigned_instance IS NULL
    AND id NOT IN (SELECT task_job_id FROM task_queue);