Melo Tang Melo Tang - 2 months ago 11
SQL Question

SQL Union All questions

My database contains two tables named

DomesticSalesOrders
and
InternationalSalesOrders
. Both tables contain more than 100 million rows. Each table has a Primary Key column named
SalesOrderId
. The data in the two tables is distinct from one another.

Business users want a report that includes aggregate information about the total number of global sales and total sales amounts. I need to ensure that my query executes in the minimum possible time. Which query should I use?

Option 1:

SELECT
COUNT(*) AS NumberOfSales,
SUM( SalesAmount ) AS TotalSalesAmount
FROM
(
SELECT
SalesOrderId,
SalesAmount
FROM
DomesticSalesOrders

UNION ALL

SELECT
SalesOrderId,
SalesAmount
FROM
InternationalSalesOrders
) AS p


Option 2:

SELECT
COUNT(*) AS NumberOfSales,
SUM( SalesAmount ) AS TotalSalesAmount
FROM
DomesticSalesOrders

UNION ALL

SELECT
COUNT(*) AS NumberOfSales,
SUM( SalesAmount ) AS TotalSalesAmount
FROM
InternationalSalesOrders


I think both are correct but I can't understand what is different? thanks

Dai Dai
Answer

The first answer is correct because it only returns a single row:

NumberOfSales        | TotalSalesAmount
---------------------+----------
COUNT( of subquery ) | SUM( of subquery )

Whereas the second answer returns two rows:

NumberOfSales          | TotalSalesAmount
-----------------------+----------
COUNT( of subquery 1 ) | SUM( of subquery 1 )
COUNT( of subquery 2 ) | SUM( of subquery 2 )

A better answer would be to use the split-subqueries of option 2 to take advantage of parallelization, and then finally compute a second level of aggregation:

SELECT
    SUM( [inner].NumberOfSales ) AS NumberOfSales,
    SUM( [inner].SalesAmount   ) AS TotalSalesAmount
FROM
(
    SELECT
        COUNT(*) AS NumberOfSales,
        SUM( SalesAmount ) AS TotalSalesAmount
    FROM
        DomesticSalesOrders

    UNION ALL 

    SELECT
        COUNT(*) AS NumberOfSales,
        SUM( SalesAmount ) AS TotalSalesAmount
    FROM
        InternationalSalesOrders
) AS [inner]

Of course, you would need to see how your RDBMS executes it and compare execution plans. An engine smart enough would generate the same execution plan for this query as for Option 1 in your question, but you can't always make that assumption.

Comments