Jayhawk Jayhawk - 2 months ago 18
R Question

R: Painfully slow read performance using RODBC & SQL Server

I am new to R but am interested in using Shiny to create dynamic charts using data stored in a SQL Server database. To enable interactivity, I want to bring in the raw data from the database and perform calculations within

R
rather than have the database summarize the data.

I am able to connect to the database using
RODBC
, execute a query, and receive results in a
data.frame
. However, the read time in
R
is about 12x longer than than the same query executed in SQL Server Management Studio (SSMS). SSMS takes ~600 ms, whereas
R
takes about 7.6 seconds. My question is whether I am doing something wrong, or is
R
just really slow with database access? And if so, are there faster alternatives (e.g. writing the database output to a file and reading the file)?

Some information about the query that may help: The query retrieves about 250K rows with 4 columns. The first column is a date and the other three are numeric values. The machine running
R
and SSMS is a high-end Win 7 workstation with 32GB of memory. The
R
command that I am running is:

system.time(df <- sqlQuery(cn, query))


which returns:

user system elapsed
7.17 0.01 7.58


Interestingly, it appears that the data transfer from SQL to my machine is fast, but that
R
is busy doing things internally for several seconds before returning the
data.frame
. I see this because network utilization spikes in the first second and almost immediately returns to near 0. Then several seconds later, the
R
data.frame
returns.

Answer Source

I would try RJDBC http://cran.r-project.org/web/packages/RJDBC/RJDBC.pdf

with these drivers https://msdn.microsoft.com/en-us/sqlserver/aa937724.aspx

library(RJDBC)
drv <- JDBC("com.microsoft.sqlserver.jdbc.SQLServerDriver","/sqljdbc4.jar") 
con <- dbConnect(drv, "jdbc:sqlserver://server.location", "username", "password")
dbGetQuery(con, "select column_name from table")