Vinod Sagar Vinod Sagar - 2 months ago 4
SQL Question

Performance issues generating a unique name

I have a table 'Objects' present in SQL Server DB. It contains the names (string) of objects.
I have a list of names of new objects that need to be inserted in the 'Objects' table, in a separate table 'NewObjects'. This operation will be referred as 'import' henceforward.

I need to generate a unique name for each record to be imported to 'Objects' from 'NewObjects', if the record name is already present in 'Objects'. This new name will be stored in 'NewObjects' table against the old name.



DECLARE @NewObjects TABLE
(
...
Name varchar(20),
newName nvarchar(20)
)


I have implemented a stored procedures which generates unique name for each record to be imported from 'NewObjects'. However, I am not happy with the performance for 1000 records (in 'NewObjects'.)
I want help to optimize my code. Below is the implementation:



PROCEDURE [dbo].[importWithNewNames] @args varchar(MAX)

-- Sample of @args is like 'A,B,C,D' (a CSV string)
...


DECLARE @NewObjects TABLE
(
_index int identity PRIMARY KEY,
Name varchar(20),
newName nvarchar(20)
)

-- 'SplitString' function: this is a working implementation which is right now not concern of performance
INSERT INTO @NewObjects (Name)
SELECT * from SplitString(@args, ',')

declare @beg int = 1
declare @end int
DECLARE @oldName varchar(10)

-- get the count of the rows
select @end = MAX(_index) from @NewObjects

while @beg <= @end
BEGIN
select @oldName = Name from @NewObjects where @beg = _index

Declare @nameExists int = 0

-- this is our constant. We cannot change
DECLARE @MAX_NAME_WIDTH int = 5

DECLARE @counter int = 1
DECLARE @newName varchar(10)
DECLARE @z varchar(10)

select @nameExists = count(name) from Objects where name = @oldName
...
IF @nameExists > 0
BEGIN
-- create name based on pattern 'Fxxxxx'. Example: 'F00001', 'F00002'.
select @newName = 'F' + REPLACE(STR(@counter, @MAX_NAME_WIDTH, 0), ' ', '0')

while EXISTS (select top 1 1 from Objects where name = @newName)
OR EXISTS (select top 1 1 from @NewObjects where newName = @newName)
BEGIN
select @counter = @counter + 1
select @newName = 'F' + REPLACE(STR(@counter, @MAX_NAME_WIDTH, 0), ' ', '0')
END

select top 1 @z = @newName from Objects

update @NewObjects
set newName = @z where @beg = _index
END

select @beg = @beg + 1
END

-- finally, show the new names generated
select * from @NewObjects

Answer

DISCLAIMER: I am in no position to test these recommendations therefore there may be syntax errors that you'll have to work out on your own as you implement them. They are here as a guide to both fix this procedure but also aid you in growing your skill set for future projects.

One optimization just skimming through, that would become more prevalent as you iterated over larger sets, is this code here:

select @nameExists = count(name) from Objects where name = @oldName
...
IF @nameExists > 0

consider changing it to this:

IF EXISTS (select name from Objects where name = @oldName)

Also, rather than doing this:

-- create name based on pattern 'Fxxxxx'. Example: 'F00001', 'F00002'.
select @newName = 'F' + REPLACE(STR(@counter, @MAX_NAME_WIDTH, 0), ' ', '0')

while EXISTS (select top 1 1 from Objects where name = @newName)
 OR EXISTS (select top 1 1 from @NewObjects where newName = @newName)
BEGIN
    select @counter = @counter + 1
    select @newName = 'F' + REPLACE(STR(@counter, @MAX_NAME_WIDTH, 0), ' ', '0')
END

consider this:

DECLARE @maxName VARCHAR(20)
SET @newName = 'F' + REPLACE(STR(@counter, @MAX_NAME_WIDTH, 0), ' ', '0')

SELECT @maxName = MAX(name) FROM Objects WHERE name > @newName ORDER BY name
IF (@maxName IS NOT NULL)
BEGIN
    @counter = CAST(SUBSTRING(@maxName, 2) AS INT)
    SET @newName = 'F' + REPLACE(STR(@counter, @MAX_NAME_WIDTH, 0), ' ', '0')
END

that will ensure that you're not iterating and doing multiple queries just to find the maximum integer value of the generated name.

Further, based on what little context I have, you should also be able to make one more optimization that will ensure you only have to do the aforementioned one time, ever.

DECLARE @maxName VARCHAR(20)
SET @newName = 'F' + REPLACE(STR(@counter, @MAX_NAME_WIDTH, 0), ' ', '0')

IF (@beg = 1)
BEGIN
    SELECT @maxName = MAX(name) FROM Objects WHERE name > @newName ORDER BY name
    IF (@maxName IS NOT NULL)
    BEGIN
        @counter = CAST(SUBSTRING(@maxName, 2) AS INT)
        SET @newName = 'F' + REPLACE(STR(@counter, @MAX_NAME_WIDTH, 0), ' ', '0')
    END
END

The reason I say you can make that optimization is because unless you have to worry about other entities inserting records during this time that look like the ones you are (e.g. Fxxxxx), then you only have to find the MAX one time and can simply iterate @counter over the loop.

In fact, you could actually pull this entire piece out of the loop. You should be able to extrapolate that pretty easily. Just pull the DECLARE and SET of @counter out along with the code inside the IF (@beg = 1). But take it one step at a time.


Also, change this line:

select top 1 @z = @newName from Objects

to this:

SET @z = @newName

because you are literally running a query to SET two local variables. This is likely a huge cause for the performance issues. A good practice for you to get into is unless you're actually setting a variable from a SELECT statement, use the SET operation for local variables. There are some other places in your code where this applies, consider this line:

select @beg = @beg + 1

use this instead:

SET @beg = @beg + 1

Finally, as stated above regarding simply iterating @counter, at the end of the loop where you have this line:

select @beg = @beg + 1

just add a line:

SET @counter = @counter + 1

and you're golden!


So to recap, you can gather the maximum conflicting name just one time so you'll be getting rid of all those iterations. You're going to start using SET to get rid of performance ridden lines like select top 1 @z = @newName from Objects where you're actually querying a table to set two local variables. And you're going to leverage the EXISTS method instead of setting a variable that leveraged an AGGREGATE function COUNT to do that work.

Let me know how these optimizations work.

Comments