pythonlarry pythonlarry - 1 month ago 6
SQL Question

Return pre-UPDATE Column Values Using SQL Only - PostgreSQL Version

I have a related question, but this is another part of MY puzzle.

I would like to get the OLD VALUE of a Column from a Row that was UPDATEd - WITHOUT using Triggers (nor Stored Procedures, nor any other extra, non-SQL/-query entities).

The query I have is like this:

UPDATE my_table
SET processing_by = our_id_info -- unique to this worker
WHERE trans_nbr IN (
SELECT trans_nbr
FROM my_table
GROUP BY trans_nbr
HAVING COUNT(trans_nbr) > 1
LIMIT our_limit_to_have_single_process_grab
)
RETURNING row_id;


If I could do "FOR UPDATE ON my_table" at the end of the subquery, that'd be devine (and fix my other question/problem). But, that won't work: can't have this AND a "GROUP BY" (which is necessary for figuring out the COUNT of trans_nbr's). Then I could just take those trans_nbr's and do a query first to get the (soon-to-be-) former processing_by values.

I've tried doing like:

UPDATE my_table
SET processing_by = our_id_info -- unique to this worker
FROM my_table old_my_table
JOIN (
SELECT trans_nbr
FROM my_table
GROUP BY trans_nbr
HAVING COUNT(trans_nbr) > 1
LIMIT our_limit_to_have_single_process_grab
) sub_my_table
ON old_my_table.trans_nbr = sub_my_table.trans_nbr
WHERE my_table.trans_nbr = sub_my_table.trans_nbr
AND my_table.processing_by = old_my_table.processing_by
RETURNING my_table.row_id, my_table.processing_by, old_my_table.processing_by


But that can't work;
old_my_table
is not visible outside the join; the
RETURNING
clause is blind to it.

I've long since lost count of all the attempts I've made; I have been researching this for literally hours.

If I could just find a bullet-proof way to lock the rows in my subquery - and ONLY those rows, and WHEN the subquery happens - all the concurrency issues I'm trying to avoid would disappear ...




UPDATE: [WIPES EGG OFF FACE] Okay, so I had a typo in the non-generic code of the above that I wrote "doesn't work"; it does... thanks to Erwin Brandstetter, below, who stated it would, I re-did it (after a night's sleep, refreshed eyes, and a banana for bfast). Since it took me so long/hard to find this sort of solution, perhaps my embarrassment is worth it? At least this is on SO for posterity now... :>

What I now have (that works) is like this:

UPDATE my_table
SET processing_by = our_id_info -- unique to this worker
FROM my_table AS old_my_table
WHERE trans_nbr IN (
SELECT trans_nbr
FROM my_table
GROUP BY trans_nbr
HAVING COUNT(*) > 1
LIMIT our_limit_to_have_single_process_grab
)
AND my_table.row_id = old_my_table.row_id
RETURNING my_table.row_id, my_table.processing_by, old_my_table.processing_by AS old_processing_by


The COUNT(*) is per a suggestion from Flimzy in a comment on my other (linked above) question. (I was more specific than necessary. [In this instance.])

Please see my other question for correctly implementing concurrency and even a non-blocking version; THIS query merely shows how to get the old and new values from an update, ignore the bad/wrong concurrency bits.

Answer

Problem

The manual explains:

The optional RETURNING clause causes UPDATE to compute and return value(s) based on each row actually updated. Any expression using the table's columns, and/or columns of other tables mentioned in FROM, can be computed. The new (post-update) values of the table's columns are used. The syntax of the RETURNING list is identical to that of the output list of SELECT.

Emphasis mine. There is no way to access the old row in a RETURNING clause. You can do that in a trigger or with a separate SELECT before the UPDATE, wrapped in a transaction as @Flimzy and @wildplasser commented, or wrapped in a CTE as @MattDiPasquale posted.

Solution

However, what you are trying to achieve works perfectly fine if you join in another instance of the table in the FROM clause:

UPDATE tbl x
SET    tbl_id = 23
     , name = 'New Guy'
FROM   tbl y                -- using the FROM clause
WHERE  x.tbl_id = y.tbl_id  -- must be unique
AND    x.tbl_id = 3
RETURNING y.tbl_id AS old_id, y.name AS old_name
        , x.tbl_id          , x.name;

Returns:

 old_id | old_name | tbl_id |  name
--------+----------+--------+---------
  3     | Old Guy  | 23     | New Guy

SQL Fiddle.

I tested this with PostgreSQL versions from 8.4 to 9.6.

It's different for INSERT:

Dealing with concurrent load

There are several ways to avoid race conditions with concurrent write operations. The simple, slow and sure (but expensive) method is to run the transaction with SERIALIZABLE isolation level.

BEGIN ISOLATION LEVEL SERIALIZABLE;
UPDATE ..;
COMMIT;

But that's probably overkill. And you'd need to be prepared to repeat the operation if you get a serialization failure.
Simpler and faster (and just as reliable with concurrent write load) is an explicit lock on the one row to be updated:

UPDATE tbl x
SET    tbl_id = 24
     , name = 'New Gal'
FROM  (SELECT tbl_id, name FROM tbl WHERE tbl_id = 4 FOR UPDATE) y 
WHERE  x.tbl_id = y.tbl_id
RETURNING y.tbl_id AS old_id, y.name AS old_name, x.tbl_id, x.name;

More explanation, examples and links under this related question: