Dimitriy V. Masterov Dimitriy V. Masterov - 6 months ago 130
SQL Question

Creating and using temporary/volatile database tables In Stata

Addendum: As of Stata 14, volatile tables work without any hacks.



Is there a way to tweak Stata to work with temporary volatile tables? These tables and the data are deleted after a user logs off the session.

Here's an example of a simple toy SQL query that I am using in Stata and Teradata:

odbc load, exec("
BEGIN TRANSACTION;
CREATE VOLATILE MULTISET TABLE vol_tab AS (
SELECT TOP 10 user_id
FROM dw_users
) WITH DATA
PRIMARY INDEX(user_id)
ON COMMIT PRESERVE ROWS;

SELECT * FROM vol_tab;
END TRANSACTION;
") dsn("mozart");


This is the error message I am getting:

The ODBC driver reported the following diagnostics
[Teradata][ODBC Teradata Driver][Teradata Database] Only an ET or null statement is legal after a DDL Statement.
SQLSTATE=25000
r(682);


The Stata error code means:


error . . . . . . . . . . . . . . . . . . . . . . . . Return code
682
could not connect to odbc dsn;
This typically occurs because of incorrect permissions, such
as a bad User Name or Password. Use set debug on to display
the actual error message generated by the ODBC driver.


As far as I can tell permission are fine since I can pull data if I just execute the "SELECT TOP 10..." query. I set debug on, but it did not produce any additional information.

Session mode is Teradata. ODBC manager is set to unixODBC. I am using Stata 13.1 on an Ubuntu server.

I believe the underlying issue may be that separate connections are established for each SQL statement, so the volatile table evaporates by the time the select is issued. I am waiting on tech support to verify this.

I tried using the
odbc sqlfile
command well, but this approach does not work unless I create a permanent table at the end of it. There's no load option with
odbc sqlfile
.

Volatile tables seem to work just fine in SAS and R. For example, this works perfectly:

library("RODBC")
db <- odbcConnect("mozart")
sqlQuery(db,"CREATE VOLATILE MULTISET TABLE vol_tab AS (
SELECT TOP 10 user_id
FROM dw_users
) WITH DATA
PRIMARY INDEX(user_id)
ON COMMIT PRESERVE ROWS;
")
data<- sqlQuery(db,"select * from vol_tab;",rows_at_time=1)


Perhaps this is because the connection to the DB remains open until
close(db)
.

Answer

This answer is not longer correct. Stata now allows multiple SQL statements as long as the multistatement option is added to the odbc command.


Stata's odbc command does not allow combining multiple SQL statements into a single odbc command and altering TD's mode. It also creates a separate connection for each odbc command issued, so the volatile table goes poof by the time you want to use it to do something. This makes it impossible to use volatile tables directly.

However, there is a way to use R through Stata to produce a Stata data file. You need to install rsource from SSC and the foreign and RODBC packages in R. The 2 globals Rterm_path and Rterm_options for rsource can be defined in sysprofile.ado or in your own profile.ado. As far as I can determine, R does not allow exporting timestamps, so I had to do some conversion of dates and timestamps by hand. These conversions are somewhat at odds with the suggestions in the Stata manuals and the Stata blog.

rsource, terminator(END_OF_R)
  library("RODBC")
  library("foreign")
  db <- odbcConnect("mydsn")
  sqlQuery(db,"CREATE VOLATILE MULTISET TABLE vol_tab AS (SELECT ...) WITH DATA PRIMARY INDEX(...) ON COMMIT PRESERVE ROWS;")
  data<- sqlQuery(db,"SELECT * FROM vol_tab;",rows_at_time=1)
  write.dta(data,"mydata.dta",convert.dates = FALSE)
  close(db)
END_OF_R

use "mydata.dta", replace
/* convert dates and timestamps to Stata format */
gen stata_date = rdate + td(01jan1970)
format stata_date %td
gen double stata_timestamp = (rtimestamp + 315594000)*1000
format stata_timestamp %tc