stitch70 stitch70 - 3 days ago 5
SQL Question

SELECT INTO query with NOT IN subquery takes a long time / hangs

The issue is that this query hangs or has infinite records which I am not aware on how to fix using MS ACCESS:

Expected User Input:

User input Start Date: 1/15/2015
User input End Date: 11/15/2015
User input Upper Data Threshold in kB: 50


Source Table:

[Master] Table in Access:
Invc Date Mobile Nbr PktDtVol
--------- ---------- --------
1/15/15 647-409-8206 48kB
2/15/15 647-409-8206 33kB
3/15/15 647-409-8206 8000kB
4/15/15 647-409-8206 20kB
5/15/15 647-409-8206 10kB
6/15/15 647-409-8206 0kB
7/15/15 718-500-2311 3kB
8/15/15 718-500-2311 45kB
9/15/15 718-500-2311 25kB
10/15/15 514-300-3311 33kB
11/15/15 514-300-3311 20kB


Output in [Temp_Table]:

Invc Date Mobile Nbr PktDtVol Difference in Days
--------- ---------- -------- -------------------
7/15/15 718-500-2311 3kB 304
8/15/15 718-500-2311 45kB 304
9/15/15 718-500-2311 25kB 304
10/15/15 514-300-3311 33kB 304
11/15/15 514-300-3311 20kB 304


Accepted SQL Solution to generate the above output:

PARAMETERS [Start Date] DateTime, [End Date] DateTime, [Upper Bound Usage in KB] IEEEDouble;
SELECT m.[Invc Date], m.PktDtVol, m.[Mobile Nbr], DateDiff("d",[Start Date],[End Date]) AS [Difference in days]
INTO Temp_Table FROM Master AS m
WHERE (m.[Invc Date]>=[Start Date] And m.[Invc Date])<=[End Date] AND m.[Mobile Nbr] NOT IN
(SELECT q.[Mobile Nbr] FROM Master AS q WHERE (q.PktDtVol>=[Upper Bound Usage in KB]));


From here, I've attempted to create an index to optimize the query by modifying the table through another SQL statement, but does not work:

CREATE INDEX Index2 ON Master([Ttl Charges])


The query works fine with no hanging with 10 records in the source table with multiple records with the expected output. But the issue arises when there is 56,000 records in the source table with multiple records.

Answer

An index on [Ttl Charges] isn't going to help you any, but an index on [PktDtVol] will. I just did a test with 10,000 rows and the lack of an index on [PktDtVol] was definitely a performance bottleneck:

Indexes:
  none
Time:
  170 seconds (just under 3 minutes)

Indexes:
  [Invc Date]
  [Mobile Number]
Time:
  (same as before)

Indexes:
  [Invc Date]
  [Mobile Number]
  [PktDtVol]
Time:
  36 seconds

For an additional performance boost, you can reformulate the query to use a LEFT JOIN instead of a NOT IN clause with a subquery, as you mentioned in your comment:

PARAMETERS [Start Date] DateTime, [End Date] DateTime, [Upper Bound Usage in KB] IEEEDouble; 
SELECT 
    [Master].[Invc Date], 
    [Master].PktDtVol, 
    [Master].[Mobile Nbr], 
    DateDiff("d",[Start Date],[End Date]) AS [Difference in days] 
INTO Temp_Table 
FROM 
    [Master] 
    LEFT OUTER JOIN 
    (
        SELECT DISTINCT q.[Mobile Nbr] FROM Master AS q 
        WHERE (q.PktDtVol>=[Upper Bound Usage in KB])
    ) s 
        ON [Master].[Mobile Nbr] = s.[Mobile Nbr] 
WHERE 
    [Master].[Invc Date] >= [Start Date]
        AND [Master].[Invc Date] <= [End Date]
        AND s.[Mobile Nbr] IS NULL;
Comments