view raw
David Portabella David Portabella - 7 months ago 52
Scala Question

Query a MySQL Database from Spark

I am currently querying a MySQL from Spark as follows:

val sqlContext = new SQLContext(sc)

val dataframe_mysql = sqlContext
.read.format("jdbc").option("driver", "com.mysql.jdbc.Driver")
.option("url", url)
.option("dbtable", "MyTable")
.option("user", "MyUser").option("password", "MyPwd")


val lastNames = dataframe_mysql.sqlContext.sql("select lastName from MyTable")

But this loads all the columns from the
, and I don't need that (I only need the
column). (moreover, I need a sql join query, and I don't want to load everything into Spark).

How can I replace the
option("dbtable", "MyTable")
line to specify a SQL query instead of a table?

I have tried querying with
.option("dbtable", "select lastName from MyTable")
(as I read somewhere), but this fails with :

MySQLSyntaxErrorException: select lastName from MyTable WHERE 1=0


What you have read is correct, you'll need to pass the query with pushdown predicates as an option.

MySQL expects here a aliased table or the table fullname :

// without pushdown predicates
.option("dbtable", "MyTable")

// with pushdown predicates
.option("dbtable", "(select lastName from MyTable) as T")

So you are actually looking to use the later query.

Note: T can be anything here.