Joseph Kingry Joseph Kingry - 4 months ago 16
SQL Question

Why is inserting into and joining #temp tables faster?

I have a query that looks like

SELECT
P.Column1,
P.Column2,
P.Column3,
...
(
SELECT
A.ColumnX,
A.ColumnY,
...
FROM
dbo.TableReturningFunc1(@StaticParam1, @StaticParam2) AS A
WHERE
A.Key = P.Key
FOR XML AUTO, TYPE
),
(
SELECT
B.ColumnX,
B.ColumnY,
...
FROM
dbo.TableReturningFunc2(@StaticParam1, @StaticParam2) AS B
WHERE
B.Key = P.Key
FOR XML AUTO, TYPE
)
FROM
(
<joined tables here>
) AS P
FOR XML AUTO,ROOT('ROOT')


P has ~ 5000 rows
A and B ~ 4000 rows each

This query has a runtime performance of ~10+ minutes.

Changing it to this however:

SELECT
P.Column1,
P.Column2,
P.Column3,
...
INTO #P

SELECT
A.ColumnX,
A.ColumnY,
...
INTO #A
FROM
dbo.TableReturningFunc1(@StaticParam1, @StaticParam2) AS A

SELECT
B.ColumnX,
B.ColumnY,
...
INTO #B
FROM
dbo.TableReturningFunc2(@StaticParam1, @StaticParam2) AS B


SELECT
P.Column1,
P.Column2,
P.Column3,
...
(
SELECT
A.ColumnX,
A.ColumnY,
...
FROM
#A AS A
WHERE
A.Key = P.Key
FOR XML AUTO, TYPE
),
(
SELECT
B.ColumnX,
B.ColumnY,
...
FROM
#B AS B
WHERE
B.Key = P.Key
FOR XML AUTO, TYPE
)
FROM #P AS P
FOR XML AUTO,ROOT('ROOT')


Has a performance of ~4 seconds.

This makes not a lot of sense, as it would seem the cost to insert into a temp table and then do the join should be higher by default. My inclination is that SQL is doing the wrong type of "join" with the subquery, but maybe I've missed it, there's no way to specify the join type to use with correlated subqueries.

Is there a way to achieve this without using #temp tables/@table variables via indexes and/or hints?

EDIT: Note that dbo.TableReturningFunc1 and dbo.TableReturningFunc2 are inline TVF's, not multi-statement, or they are "parameterized" view statements.

Answer

Your procedures are being reevaluated for each row in P.

What you do with the temp tables is in fact caching the resultset generated by the stored procedures, thus removing the need to reevaluate.

Inserting into a temp table is fast because it does not generate redo / rollback.

Joins are also fast, since having a stable resultset allows possibility to create a temporary index with an Eager Spool or a Worktable

You can reuse the procedures without temp tables, using CTE's, but for this to be efficient, SQL Server needs to materialize the results of CTE.

You may try to force it do this with using an ORDER BY inside a subquery:

WITH    f1 AS
        (
        SELECT  TOP 1000000000
                A.ColumnX,
                A.ColumnY
        FROM    dbo.TableReturningFunc1(@StaticParam1, @StaticParam2) AS A
        ORDER BY
                A.key
        ),
        f2 AS
        (
        SELECT  TOP 1000000000
                B.ColumnX,
                B.ColumnY,
        FROM    dbo.TableReturningFunc2(@StaticParam1, @StaticParam2) AS B  
        ORDER BY
                B.Key
        )
SELECT  …

, which may result in Eager Spool generated by the optimizer.

However, this is far from being guaranteed.

The guaranteed way is to add an OPTION (USE PLAN) to your query and wrap the correspondind CTE into the Spool clause.

See this entry in my blog on how to do that:

This is hard to maintain, since you will need to rewrite your plan each time you rewrite the query, but this works well and is quite efficient.

Using the temp tables will be much easier, though.

Comments