FiofanS FiofanS - 1 month ago 16
Scala Question

The error java.lang.ClassNotFoundException when running spark-submit

I want run spark-submit for my Scala Spark application. These are the steps I did:

1) execute Maven Clean and Package from IntellijIDEA to get myTest.jar
2) execute the following spark-submit command:

spark-submit --name 28 --master local[2] --class org.test.consumer.TestRunner \
/usr/tests/test1/target/myTest.jar \
$arg1 $arg2 $arg3 $arg4 $arg5


This is the
TestRunner
object that I want to run:

package org.test.consumer

import org.test.consumer.kafka.KafkaConsumer

object TestRunner {

def main(args: Array[String]) {

val Array(zkQuorum, group, topic1, topic2, kafkaNumThreads) = args

val processor = new KafkaConsumer(zkQuorum, group, topic1, topic2)
processor.run(kafkaNumThreads.toInt)

}

}


But the
spark-submit
command fails with the following message:

java.lang.ClassNotFoundException: org.test.consumer.TestRunner
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:348)
at org.apache.spark.util.Utils$.classForName(Utils.scala:225)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:686)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:185)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:210)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:124)


I don't really understand why the object
TestRunner
cannot be found, if the package is specified correctly... Has it something to do with the usage of
object
instead of
class
?

UPDATE:

The project structure (the folder
scala
is currently marked as Sources):

/usr/tests/test1
.idea
src
main
docker
resources
scala
org
test
consumer
kafka
KafkaConsumer.scala
TestRunner.scala
test
target


pom.xml

<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>

<groupId>org.test.abc</groupId>
<artifactId>consumer</artifactId>
<version>1.0-SNAPSHOT</version>

<dependencies>
<dependency>
<groupId>org.scala-lang</groupId>
<artifactId>scala-library</artifactId>
<version>2.11.8</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-streaming_2.11</artifactId>
<version>1.6.2</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-streaming-kafka_2.11</artifactId>
<version>1.6.2</version>
</dependency>
<dependency>
<groupId>com.fasterxml.jackson.module</groupId>
<artifactId>jackson-module-scala_2.11</artifactId>
<version>2.7.5</version>
</dependency>
<dependency>
<groupId>com.fasterxml.jackson.core</groupId>
<artifactId>jackson-core</artifactId>
<version>2.7.5</version>
</dependency>
<dependency>
<groupId>com.fasterxml.jackson.core</groupId>
<artifactId>jackson-databind</artifactId>
<version>2.7.5</version>
</dependency>
<dependency>
<groupId>org.sedis</groupId>
<artifactId>sedis_2.11</artifactId>
<version>1.2.2</version>
</dependency>
<dependency>
<groupId>com.lambdaworks</groupId>
<artifactId>jacks_2.11</artifactId>
<version>2.3.3</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-sql_2.11</artifactId>
<version>1.6.2</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-mllib-local_2.11</artifactId>
<version>2.0.0</version>
</dependency>
<dependency>
<groupId>com.github.nscala-time</groupId>
<artifactId>nscala-time_2.11</artifactId>
<version>2.12.0</version>
</dependency>
</dependencies>

</project>

Answer

@FiofanS, the problem is in your directory structure.

Maven uses a convention over configuratation policy. It means, by default, maven expects that you will follow the set of rules that it has defined. For example, it expects you to put all your code in src/main/java directory (See Maven Standard Directory Structure). But you don't have your code in src/main/java directory. Instead, you have it in src/main/scala directory. By default, maven will not consider src/main/scala as source location.

Although , maven expects you to follow the rules it has defined, but it doesn't enforce them. It also provides you ways in which you can configure things based on your preference.
In your case, you will have to explicitly instruct maven to consider src/main/scala also as one of your source location.

To do this, you will have to use the Maven Build Helper Plugin.
Add the below piece of code within the <project>...</project> tag in your pom.xml

  <build>
    <plugins>
      <plugin>
        <groupId>org.codehaus.mojo</groupId>
        <artifactId>build-helper-maven-plugin</artifactId>
        <version>1.7</version>
        <executions>
          <execution>
            <id>add-source</id>
            <phase>generate-sources</phase>
            <goals>
              <goal>add-source</goal>
            </goals>
            <configuration>
              <sources>
                <source>src/main/scala</source>
              </sources>
            </configuration>
          </execution>
        </executions>
      </plugin>
    </plugins>
  </build>

This should solve your problem.