Zi0n1 Zi0n1 - 2 months ago 6
SQL Question

Poor performance from looping SQL Server Update statement

So I have a stored procedure that needs to regularly update a large table with around 70 million records in it. I typically always have followed the standard of doing looping updates to avoid locking any of my other large tables and with that I normally didnt see a performance impact too great.

The Problem:
I am seeing my execution time increase by 10 or 20 times when I use looping compared to the original run time without looping logic.

For Example:

If I were to run the below query I it would update 3 million records in around 1.5 minutes.

UPDATE [db1].[dbo].[Preferences]
SET LastUpdate = Getdate()
WHERE
LastUpdate >= CAST(CONVERT( Varchar(10), DATEADD(day,-1,GETDATE()), 110) as DateTime)
AND
LastUpdate < CAST(CONVERT( Varchar(10), GETDATE(), 110) as DateTime)
AND (@PreferenceID is null or @PreferenceID = PreferenceID)


This is my regular update statement without a looping mechanism behind it. Basically the parameter @PreferenceID is either supplied with an ID or is left null. Depending on that it will update either all lastUpdate's too today's date or only update one preferenceID. In my test case I am using 1 preferenceID so the @PreferenceID is being populated.

So when I added looping to this statement it went from taking 1.5 minutes to 20 minutes.

Below is the looping statement:

BEGIN
SET ROWCOUNT 10000

UPDATE [DB1].[dbo].[Preferences]
SET LastUpdate = Getdate()
WHERE
LastUpdate >= CAST(CONVERT( Varchar(10), DATEADD(day,-1,GETDATE()), 110) as DateTime)
AND
LastUpdate < CAST(CONVERT( Varchar(10), GETDATE(), 110) as DateTime)
AND (@PreferenceID is null or @PreferenceID = PreferenceID)

WHILE @@ROWCOUNT > 0

UPDATE [DB1].[dbo].[Preferences]
SET LastUpdate = Getdate()
WHERE
LastUpdate >= CAST(CONVERT( Varchar(10), DATEADD(day,-1,GETDATE()), 110) as DateTime)
AND
LastUpdate < CAST(CONVERT( Varchar(10), GETDATE(), 110) as DateTime)
AND (@PreferenceID is null or @PreferenceID = PreferenceID)


SET ROWCOUNT 0
END


So my core question is why would my execution time increase by so much just by doing a loop of 10k records a time? I can post more detailed info in regards to the table structure but I wasnt sure if this is just a steadfast rule that it will be a performance hit to do a looping update compared to just one full batch update.

Thanks in advance for any advice anyone can provide.

Answer

So, I would suspect that this is slower because on every loop your query has to figure out the set of records to update i.e. evaluate this:

   WHERE
        LastUpdate >= CAST(CONVERT( Varchar(10), DATEADD(day,-1,GETDATE()), 110) as DateTime) 
        AND 
        LastUpdate < CAST(CONVERT( Varchar(10), GETDATE(), 110) as DateTime)
        AND (@PreferenceID is null or @PreferenceID = PreferenceID)

I would also note that your second update does not do exactly the same thing as the first - try running both of these just before midnight one day - you'll get different results for the second set of updates.