akaWizzmaster akaWizzmaster - 6 months ago 20
SQL Question

Updating millions of records in SQL Server 2012

I think this will be easier to explain by including a couple of tables.

Table A:

UserId StoreId
1 10
2 20
3 30
4 40


Table B:

UserId StoreId
1 10
1 20
1 20
2 20
2 10
2 10
3 30
3 40
3 40
4 40
4 30
4 30


I need a script that will go into Table 2, and Update the store to match the
StoreId
in Table A for that specific User. In the end, I should see;

Table B:

UserId StoreId
1 10
1 10
1 10
2 20
2 20
2 20
3 30
3 30
3 30
4 40
4 40
4 40


So my question is, what is the easiest way to do this, considering that Table B has hundred of millions of rows in it, and table A has about 20 million rows?

Thanks in advance!

Answer

The best way is:

ALTER TABLE Table_B
DROP COLUMN StoreId

Now you're no longer storing duplicate data and you can just get the StoreId from Table_A when you need it.

Otherwise it's just a simple UPDATE with a JOIN:

UPDATE B
SET
    StoreId = A.StoreID
FROM
    Table_B B
INNER JOIN TABLE_A A ON A.UserId = B.UserId

You can put this into a loop that performs batches if you need to in order to cut down on the size of the update so that it doesn't flood your transaction log. There are a few ways to do that. One example:

DECLARE
    @batch_size INT = 10000,
    @min_user_id INT = 1,
    @max_user_id INT

SELECT @max_user_id = MAX(UserId) FROM Table_A

WHILE (@min_user_id <= @max_user_id)
BEGIN
    UPDATE B
    SET
        StoreId = A.StoreID
    FROM
        Table_B B
    INNER JOIN TABLE_A A ON A.UserId = B.UserId
    WHERE
        B.UserId BETWEEN @min_user_id AND @min_user_id + @batch_size

    SET @min_user_id = @min_user_id + @batch_size + 1
END