Lars Hartviksen Lars Hartviksen - 1 month ago 11
SQL Question

Does String concatenation in database query cause performance issues?

I have a table named T_PERSON with rows such as FIRSTNAME varchar, LASTNAME varchar, CUSTOMERTYPE int, SELLERID int (and more)

Additionally I have a large "SQL IN statement" list of persons like this

:( 'JOHNxSMITHx12345x1337', 'SARAxBANNERx7612x1337' ... )


That is : FIRSTNAME x LASTNAME x CUSTOMERTYPE x SELLERID

If I run this query:

SELECT * FROM T_PERSON WHERE
STR_REPLACE(FIRSTNAME + 'x' + LASTNAME + 'x' + STR(CUSTOMERID) + 'x' + STR(SELLERID)) in
( 'JOHNxSMITHx12345x1337', 'SARAxBANNERx7612x1337' )


This query works fine on a tiny database with a few hundred rows, I wonder if I would be able to run this on a huge database with hundreds of millions of rows in T_PERSON.

Will this query be very performance heavy? E.g. if N is the size of the T_PERSON table, will database software in general make N^4 strings and compare each combination to all values in the list?

Answer

We ended up going for a very different solution, running one Query per row to be updated. Now I feel stupid and the guy at DB who did the QA of my Query who asked the question should be ashamed.

The column names FIRSTNAME LASTNAME etc. are of course all implicitly a part of the one and same TABLE ROW. So they can be written as THESAMEROW.FIRSTNAME, THESAMEROW.LASTNAME etc. Looking at it this way, only one string will of course be constructed per row in the database.

My worry that N^4 strings would be created is therefore completely wrong, and my Query would scale perfectly linear on a larger database.