Rotem Ashkenazi Rotem Ashkenazi -4 years ago 162
MySQL Question

its possible definition schema on load table from RDS (Mysql)-spark

Spark infer schema from Mysql DB on RDS(AWS environment),
data type in one of columns-"active" is tinyint(1) wait possible value:


  • non active

  • active

  • pending

  • ....



Spark recognize tinyint(1) has a Boolean type, so he change all value in “active” to “true” or ”false”,
As a result, I can’t identify the value.

it is possible definition schema on loading tables to spark?

Answer Source

It's not spark that converts the TINYINT type into a boolean but the j-connector used under the hood.

So, actually you don't need to specify a schema for that issue. Because what's actually causing this is the jdbc driver that treats the datatype TINYINT(1) as the BIT type (because the server silently converts BIT -> TINYINT(1) when creating tables).

You can check all the tips and gotchas of the jdbc connector in the MySQL official Connector/J Configuration Properties guide.

You just need to pass the right parameters for your jdbc connector by adding the following to your url connection :

val newUrl = s"$oldUrl&tinyInt1isBit=false"

val data = spark.read.format("jdbc")
     .option("url", newUrl)
     // your other jdbc options
     .load
Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download