Kristian Kristian - 11 months ago 64
MySQL Question

Whats the difference between and'jdbc')?

I'm playing with Spark connections to a local mysql instance.

I've got a mysql jdbc jar that i'm passing in:

pyspark --jars /path/to/jar

ANd I create my SQLContext, etc. And I start doing connection stuff, one version throws and error and ones does not."jdbc:mysql://localhost:3306?user=root", table="spark.words")

This throws a driver not found error."jdbc").option("url","jdbc:mysql://localhost:3306?user=root").option("dbtable","spark.words").option("driver", 'com.mysql.jdbc.Driver').load()

This works as expected.

I thought the two were roughly the same and the former was a convenience method of the latter. What's the difference and why does the
version error out?


Generally speaking these two methods should be equivalent although there can be border cases where things don't work as expected (for example DataFrameWriter with JDBC source seems to express slightly different behaviors between format("jdbc") and jdbc(...)).

In this particular case the answer is simple though. These calls are not equivalent because the second solution is explicitly declaring driver class, while the first one is not.

If you want them to behave the same way you should provide properties dict:
    url=..., table=...,
    properties={"driver": "com.mysql.jdbc.Driver"})