I have a base query that extracts Record_ID + associated Service_Number, followed by a LEFT JOIN to look up an additional related Service_Code. I've confirmed there are unique records in each sub-table.
The explain plan shows total estimated time for this query as "1 min 25 seconds", yet if I wrap a CREATE MULTISET VOLATILE TABLE statement over the query, suddenly the explain plan shows total estimated time of "72 hours 20 minutes". If I run the VOLATILE TABLE creation anyway, the job finishes within the minute.
What contributes to this additional load time? Is there anything I can look at to reduce this?
Estimated time isn't really an estimate of how long it will take to run. They should really refer to it as "estimated cost" since it doesn't really indicate a remotely realistic estimate of the actual runtime.
Without seeing your actual script, my guess would be that you picked a bad primary index for your volatile table. Since Teradata is massively parallel, it distributes your data between the AMPs based on the primary index. If you pick a bad primary index then your data does not get evenly distributed and it may be trying to load all the data into a single AMP (in the worst case). This can result in massive slowness.
If you just want to load data quickly without worrying about downstream performance then specify
NO PRIMARY INDEX for your table and it will ensure an even distribution of data accross all AMPs. However, this may give you less than great performance later on when you try to join this to other tables. So it is a good idea to put some thought into your primary index.