Birrel Birrel - 6 months ago 12
PHP Question

PHP, MySQL and Cron Jobs - Does a query store all rows to start, or go through sequentially?

I have a MySQL table:

Col 1 | Col 2 | Col 3 | Status
... | ... | ... | 0
... | ... | ... | 1
... | ... | ... | 2
etc


It is important that the table have the most up-to-date information in it, and so a cron job is run every minute, to update the table.

The
Status
column is to store whether the row needs to be updated, or is currently being updated. If the row needs to be updated, the status is 0. If the row is currently being updated, the status is 1. If the row has already been updated, the status is 2.

Once all rows have a status of 2, they are all reset to 0, and the process starts over.

The cron job runs every minute, but sometimes updating a row might take multiple minutes, meaning multiple cron jobs will be running simultaneously.

My question is, if I have a query like:

UPDATE * FROM table WHERE status=0


does the query go through one at a time, to the next 0? Or does the query look at all the rows first, and store which ones it will eventually visit?

EXAMPLE

Say that the following table is set up:

Col 1 | Col 2 | Col 3 | Status
... | ... | ... | 0
... | ... | ... | 0
... | ... | ... | 0
... | ... | ... | 0
... | ... | ... | 0


At t=0, the first cron job (cj1) begins. It enters the first row, and sets the status to 1.

Col 1 | Col 2 | Col 3 | Status
... | ... | ... | 1
... | ... | ... | 0
... | ... | ... | 0
... | ... | ... | 0
... | ... | ... | 0


This process takes more than a minute, and so a second cron job (cj2) begins at t=1m.

cj2 sees that the first row is already being updated, and so goes to the second row.

Col 1 | Col 2 | Col 3 | Status
... | ... | ... | 1
... | ... | ... | 1
... | ... | ... | 0
... | ... | ... | 0
... | ... | ... | 0


Let's say that cj2 is busy updating that row for a few minutes. When cj1 finishes with the first row, will it skip to the 3rd row, because it sees that row 2 has a status of 1? Or will cj2 go to the second row, because it initially had a status of 0 when the query was called?

Answer

Since you're using InnoDB, each query will be performed as a transaction by default. So when you do

UPDATE table
SET <whatever>
WHERE status = 0

it will lock all the rows that match the status value. Other processes that perform a similar query will be blocked if they try to access any of these rows.

The specific way that it does this depends on whether there's an index on the status column. If there is, it simply locks that index entry and then updates all the rows it refers to.

If there's no index, it will have to step through the database sequentially. Whenever it encouters a row with status = 0 it will lock that row and then update it. Other clients may scan the database in a different order, so they might see the rows that this query will update before it gets to it. If they upate the status at the same time as they update other columns, then you shouldn't have a problem, because when this process reaches those rows they won't match the status = 0 criteria any more.

Comments