jlp jlp - 2 months ago 27
Scala Question

How to use Scala DataFrameReader option method

The Scala DataFrameReader has a function "option" which has the following signature:

def option(key: String, value: String): DataFrameReader
// Adds an input option for the underlying data source.


So what is an "input option" for the underlying data source, can someone share an example here on how to use this function?

Answer

Spark source code

  def option(key: String, value: String): DataFrameReader = {
    this.extraOptions += (key -> value)
    this
  }

Where extraOptions is simply a Map and used like that:

private def jdbc(
  url: String,
  table: String,
  parts: Array[Partition],
  connectionProperties: Properties): DataFrame = {
  val props = new Properties()
  // THIS
  extraOptions.foreach { case (key, value) =>
    props.put(key, value)
  }
  // connectionProperties should override settings in extraOptions
  props.putAll(connectionProperties)
  val relation = JDBCRelation(url, table, parts, props)(sqlContext)
  sqlContext.baseRelationToDataFrame(relation)
}

As you can see, it is simply a method to pass additional property to a jdbc driver.

There is also more general options method to pass the Map instead of single key-value and it's usage example in Spark documentation:

val jdbcDF = sqlContext.read.format("jdbc").options(
  Map("url" -> "jdbc:postgresql:dbserver",
  "dbtable" -> "schema.tablename")).load()