jlp jlp - 1 month ago 12x
Scala Question

How to use Scala DataFrameReader option method

The Scala DataFrameReader has a function "option" which has the following signature:

def option(key: String, value: String): DataFrameReader
// Adds an input option for the underlying data source.

So what is an "input option" for the underlying data source, can someone share an example here on how to use this function?


Spark source code

  def option(key: String, value: String): DataFrameReader = {
    this.extraOptions += (key -> value)

Where extraOptions is simply a Map and used like that:

private def jdbc(
  url: String,
  table: String,
  parts: Array[Partition],
  connectionProperties: Properties): DataFrame = {
  val props = new Properties()
  // THIS
  extraOptions.foreach { case (key, value) =>
    props.put(key, value)
  // connectionProperties should override settings in extraOptions
  val relation = JDBCRelation(url, table, parts, props)(sqlContext)

As you can see, it is simply a method to pass additional property to a jdbc driver.

There is also more general options method to pass the Map instead of single key-value and it's usage example in Spark documentation:

val jdbcDF = sqlContext.read.format("jdbc").options(
  Map("url" -> "jdbc:postgresql:dbserver",
  "dbtable" -> "schema.tablename")).load()