Ravi Ranjan Ravi Ranjan - 14 days ago 5
JSON Question

sbt run works but ./spark-submit does not

I want to work with lift-json parser using sbt built. My built.sbt file has the following contents:

name := "MyProject"

version := "1.0"

scalaVersion := "2.10.0"
// https://mvnrepository.com/artifact/net.liftweb/lift-json_2.10
libraryDependencies += "net.liftweb" % "lift-json_2.10" % "3.0-M1"
val lift_json = "net.liftweb" %% "lift-json_2.10" % "3.0-M1"
//val json4sNative = "org.json4s" %% "json4s-native" % "3.3.0"
//libraryDependencies += "org.scala-lang" % "scala-library" % "2.9.1"
lazy val gitclonefile = "/root/githubdependencies/lift"
lazy val g = RootProject(file(gitclonefile))
lazy val root = project in file(".") dependsOn g


MY code is this:

package org.inno.parsertest
import net.liftweb.json._
//import org.json4s._
//import org.json4s.native.JsonMethods._
object parser {
def main (args: Array[String]){
val x = parse(""" { "numbers" : [1, 2, 3, 4] } """)
println(x)
val x1 = "jaimin is awesome"
println(x1)
}
}


sbt package and then sbt run works. but when I want to run this using spark-submit, I am getting the following error:

Error: application failed with exception
java.lang.NoClassDefFoundError: net/liftweb/json/package$
at org.inno.parsertest.parser$.main(jsonparser.scala:7)
at org.inno.parsertest.parser.main(jsonparser.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:367)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:77)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.ClassNotFoundException: net.liftweb.json.package$
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
... 9 more


How can I make ./spark-submit work?

GPI GPI
Answer

As soon as the spark driver starts working on your app (when you submit it), it has to deal with the import net.liftweb.json._ line, which means it will look for this class in its classpath.

But Spark does not ship with liftweb's jar, so it is a miss, and you end up with a ClassNotFoundException.

So you need to provide the required jars with your application. There are several way, discussed at length, to do that.

You might start with the spark documentation.

Bundling Your Application’s Dependencies
If your code depends on other projects, you will need to package them alongside your application in order to distribute the code to a Spark cluster. To do this, create an assembly jar (or “uber” jar) containing your code and its dependencies. Both sbt and Maven have assembly plugins. When creating assembly jars, list Spark and Hadoop as provided dependencies; these need not be bundled since they are provided by the cluster manager at runtime. Once you have an assembled jar you can call the bin/spark-submit script as shown here while passing your jar.

One might suggest :

  1. Package your application as what is often called an "uber jar" or "fat jar", with e.g. sbt's "assembly" plugin, or maven shade, depending on your preference. This strategy merges all of the classes and ressources of all dependencies in a single JAR, the one you submit.

  2. Add arguments to the spark-submit call. There are several way, an easy one being to use the --jars argument, followed by the (comma separated) list of jar files you need. These jars will be added by spark to the actual driver/worker classpath before launching your jobs

  3. Tell spark-submit to "bind" to a maven repository

    Users may also include any other dependencies by supplying a comma-delimited list of maven coordinates with --packages. All transitive dependencies will be handled when using this command. Additional repositories (or resolvers in SBT) can be added in a comma-delimited fashion with the flag --repositories.

But a full discussion of all options is a rather long one, and I suggest you google "package spark applications" or search StackOverflow with these subjects to gain a better overview.

Sidenote : submitting to Spark an app that does not use a SparkContext seems pointless, but I guess you're just experimenting at this point.