Venkataramana Venkataramana - 1 month ago 15
Scala Question

Failed to load com.databricks.spark.csv while running with spark-submit

I am trying to run my code with spark-submit with the below command.

spark-submit --class "SampleApp" --master local[2] target/scala-2.11/sample-project_2.11-1.0.jar


And my sbt file is having below dependencies:



libraryDependencies += "org.apache.spark" %% "spark-core" % "1.4.1"

libraryDependencies += "org.apache.spark" % "spark-sql_2.11" % "1.5.2"

libraryDependencies += "com.databricks" % "spark-csv_2.11" % "1.2.0"


My code :



import org.apache.spark.SparkContext
import org.apache.spark.SparkContext._
import org.apache.spark.SparkConf
import scala.collection.mutable.ArrayBuffer
import org.apache.spark.sql.SQLContext

object SampleApp {
def main(args: Array[String]) {
val conf = new SparkConf().setAppName("Sample App").setMaster("local[2]")
val sc = new SparkContext(conf)
val sqlContext = new SQLContext(sc)

import sqlContext._
import sqlContext.implicits._

val df = sqlContext.load("com.databricks.spark.csv", Map("path" -> "/root/input/Account.csv", "header" -> "true"))

val column_names = df.columns
val row_count = df.count
val column_count = column_names.length

var pKeys = ArrayBuffer[String]()

for ( i <- column_names){
if (row_count == df.groupBy(i).count.count){
pKeys += df.groupBy(i).count.columns(0)
}
}

pKeys.foreach(print)
}
}


The error:



16/03/11 04:47:37 INFO BlockManagerMaster: Registered BlockManager
Exception in thread "main" java.lang.RuntimeException: Failed to load class for data source: com.databricks.spark.csv
at scala.sys.package$.error(package.scala:27)
at org.apache.spark.sql.sources.ResolvedDataSource$.lookupDataSource(ddl.scala:220)
at org.apache.spark.sql.sources.ResolvedDataSource$.apply(ddl.scala:233)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:114)
at org.apache.spark.sql.SQLContext.load(SQLContext.scala:1253)


My Spark Version is 1.4.1 and Scala is 2.11.7

(I am following this link: http://www.nodalpoint.com/development-and-deployment-of-spark-applications-with-scala-eclipse-and-sbt-part-1-installation-configuration/)

I have tried below versions of spark csv

spark-csv_2.10 1.2.0
1.4.0
1.3.1
1.3.0
1.2.0
1.1.0
1.0.3
1.0.2
1.0.1
1.0.0


etc.

Please help!

Answer

Since you are running the job in local mode, add external jar path using --jar option

spark-submit --class "SampleApp" --master local[2] --jar file:[path-of-spark-csv_2.11.jar],file:[path-of-other-dependency-jar] target/scala-2.11/sample-project_2.11-1.0.jar

e.g.

spark-submit --jars file:/root/Downloads/jars/spark-csv_2.10-1.0.3.jar,file:/root/Downloads/jars/com‌​mons-csv-1.2.jar,file:/root/Downloads/jars/spark-sql_2.11-1.4.1.jar --class "SampleApp" --master local[2] target/scala-2.11/my-proj_2.11-1.0.jar

Another thing you can do is create a fat jar. In SBT you can try this proper-way-to-make-a-spark-fat-jar-using-sbt and in Maven refer create-a-fat-jar-file-maven-assembly-plugin

Note: Mark scope of Spark's (i.e. spark-core, spark-streaming, spark-sql etc) jar as provided otherwise fat jar will become too fat to deploy.