Gunnar Herman Gunnar Herman - 5 months ago 24
SQL Question

Filtered Out String Still Causes Error in SQL Statement

I am using SQL Server 2008 R2.

I am writing a part number generator app. Our part numbers consist of a nine digit numeric value, such as 914602001. Before creating a new number, there are several sources that need to be checked if it already exists. To try and save time, I've created a simple union of some of these sources. The union is as follows:

SELECT DISTINCT ItemNumber
FROM dbo.EngPartNumbers
WHERE (ItemNumber NOT LIKE '%[^0-9]%')
UNION
SELECT DISTINCT ValueText COLLATE SQL_Latin1_General_CP1_CI_AS AS ItemNumber
FROM [PDMWE-Bel-ArtProductsDocManagement].dbo.VariableValue AS vv
WHERE (ValueText NOT LIKE '%[^0-9]%') AND (LEN(ValueText) = 9)


The first table, EngPartNumbers, was an Excel file that was imported into SQL. It contains one column, ItemNumber, and is of varchar data type. It has to be varchar because there was a point in time where we used letters in our naming convention.

The second table is looking to our EPDM, where VariableValue is the table that stores all of the values to our variables, which is housed in the Variables table. The ValueText column is a varchar that holds all of the variable values. In my case, I'm only concerned about 9 digit numeric values, so I applied the last line:

WHERE (ValueText NOT LIKE '%[^0-9]%') AND (LEN(ValueText) = 9)


The results of the union is what I would expect; only numbers:

Query results from vw_PDM_Union_Items

This is where my problem is. Because I would like to get the next available number, I want to work with the int data type, not the varchar. When I select everything from my view, CAST the column as an int, and add a WHERE clause, like so:

SELECT ItemNumber
FROM (
SELECT CAST(ItemNumber AS int) AS ItemNumber
FROM vw_PDM_Union_Items
) AS x
WHERE ItemNumber < 800900000


I get the following error:


Conversion failed when converting the nvarchar value '"SW-Revision"' to data type int.


After research, I've noticed the "SW-Revision" refers to a value of a variable, which is stored in the ValueText column of the VariableValue table. To me, this shouldn't matter since I'm looking at my view which has already filtered out this bad data. I've even tried wrapping my view in a select statement that CASTs the column as an int, like so:

SELECT CAST(ItemNumber AS int) AS ItemNumber
FROM (SELECT DISTINCT ItemNumber
FROM dbo.EngPartNumbers
WHERE (ItemNumber NOT LIKE '%[^0-9]%')
UNION
SELECT DISTINCT ValueText
COLLATE SQL_Latin1_General_CP1_CI_AS AS ItemNumber
FROM [PDMWE-Bel-ArtProductsDocManagement].dbo.VariableValue AS vv
WHERE (ValueText NOT LIKE '%[^0-9]%')
AND (LEN(ValueText) = 9)) AS item
WHERE (ItemNumber NOT LIKE '%[^0-9]%')


but I still get the same error. Why is SQL acting this way? What is happening in the background that causes it to look at the original table? If anyone can shed light on this situation and give me a better way to accomplish this, it would be appreciated. For manipulation sake, I would like to work with an int column, rather than a varchar.

Thank you in advance.

Answer

You can recreate this error very easily

SELECT  *
FROM    (   SELECT  ValueText
            FROM    (VALUES ('A'), ('1')) t (ValueText)
            WHERE   t.ValueText NOT LIKE '%[^0-9]%'
        ) t
WHERE   ValueText < 10;

Msg 245, Level 16, State 1, Line 1

Conversion failed when converting the varchar value 'A' to data type int.

The reason is because regardless of whether you use a view, or just a normal query, you cannot control the order in which SQL Server applies the WHERE predicates.

Because of the error, we can't see what SQL server is doing by inspecting the execution plan, but a quick change to the query (SQL Server 2012+):

SELECT  *
FROM    (   SELECT  ValueText
            FROM    (VALUES ('A'), ('1')) t (ValueText)
            WHERE   t.ValueText NOT LIKE '%[^0-9]%'
        ) t
WHERE   TRY_CONVERT(INT, ValueText) < 10;

Gives the following execution plan:

enter image description here

You can see, that SQL Server effectively simplifies the query this up to:

SELECT  ValueText
FROM    (VALUES ('A'), ('1')) t (ValueText)
WHERE   TRY_CONVERT(INT, ValueText) < 10;
AND     t.ValueText NOT LIKE '%[^0-9]%';

It does the same with an implicit convert, so in the initial query you are just executing:

SELECT  ValueText
FROM    (VALUES ('A'), ('1')) t (ValueText)
WHERE   ValueText < 10;
AND     t.ValueText NOT LIKE '%[^0-9]%';

Therefore, you get an error when evaluating A < 10 since SQL Server tries to implicitly convert A to an integer in order to compare it with 10.

What you need to solve this is intermediate materialisation, that is to say, to force SQL Server to evaluate the subquery first, store the result, and then apply the outer predicates. This is easier said than done though. There is a connect item open to request this as a query hint, but, for now there are two main workarounds.

1. Use a temporary table/table variable/Multi step TVF to materialise the results.

DECLARE @T TABLE (ValueText INT)
INSERT @T (ValueText)
SELECT  ValueText
FROM    (VALUES ('A'), ('1')) t (ValueText)
WHERE   t.ValueText NOT LIKE '%[^0-9]%';

SELECT  *
FROM    @T
WHERE   ValueText < 10;

This is obviously not ideal for you since you want to use a view.

2. Use TOP 2147483647 (Massive Hack)

SELECT  ValueText
FROM    (   SELECT  TOP 2147483647 ValueText
            FROM    (VALUES ('A'), ('1')) t (ValueText)
            WHERE   t.ValueText NOT LIKE '%[^0-9]%'
        ) t
WHERE   ValueText < 10;

It is a hack, and not guaranteed to work (although does in most cases I have had to use it), but using TOP and a number much bigger than you need will usually force the intermediate materialization of the results.

3. Use a CASE expression to remove records

SELECT  *
FROM    (   SELECT  ValueText = CASE WHEN ValueText NOT LIKE '%[^0-9]%' THEN ValueText END
            FROM    (VALUES ('A'), ('1')) t (ValueText)
            WHERE   t.ValueText NOT LIKE '%[^0-9]%'
        ) t
WHERE   ValueText < 10;

Again, this works, but I can't see that it would be guaranteed to work, there is no reason a future update would not recognise that the where predicate and the case predicate are the same and optimise the case expression away.