j9dy j9dy - 1 month ago 15
Scala Question

Spark and MongoDB application in Scala 2.10 maven built error

I want to build a Scala application with maven dependencies for Spark and MongoDB. The Scala version I use is 2.10. My pom look like this (left out unrelevant parts):

<properties>
<maven.compiler.source>1.6</maven.compiler.source>
<maven.compiler.target>1.6</maven.compiler.target>
<encoding>UTF-8</encoding>
<scala.tools.version>2.10</scala.tools.version>
<!-- Put the Scala version of the cluster -->
<scala.version>2.10.5</scala.version>
</properties>

<!-- repository to add org.apache.spark -->
<repositories>
<repository>
<id>cloudera-repo-releases</id>
<url>https://repository.cloudera.com/artifactory/repo/</url>
</repository>
</repositories>

<build>
<sourceDirectory>src/main/scala</sourceDirectory>
<testSourceDirectory>src/test/scala</testSourceDirectory>
<!-- <pluginManagement> -->
<plugins>
<plugin>
<!-- see http://davidb.github.com/scala-maven-plugin -->
<groupId>net.alchim31.maven</groupId>
<artifactId>scala-maven-plugin</artifactId>
<version>3.1.3</version>
<executions>
<execution>
<goals>
<goal>compile</goal>
<goal>testCompile</goal>
</goals>
<configuration>
<args>
<arg>-make:transitive</arg>
<arg>-dependencyfile</arg>
<arg>${project.build.directory}/.scala_dependencies</arg>
</args>
</configuration>
</execution>
</executions>
</plugin>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-surefire-plugin</artifactId>
<version>2.13</version>
<configuration>
<useFile>false</useFile>
<disableXmlReport>true</disableXmlReport>
<!-- If you have classpath issue like NoDefClassError,... -->
<!-- useManifestOnlyJar>false</useManifestOnlyJar -->
<includes>
<include>**/*Test.*</include>
<include>**/*Suite.*</include>
</includes>
</configuration>
</plugin>

<!-- "package" command plugin -->
<plugin>
<artifactId>maven-assembly-plugin</artifactId>
<version>2.4.1</version>
<configuration>
<descriptorRefs>
<descriptorRef>jar-with-dependencies</descriptorRef>
</descriptorRefs>
</configuration>
<executions>
<execution>
<id>make-assembly</id>
<phase>package</phase>
<goals>
<goal>single</goal>
</goals>
</execution>
</executions>
</plugin>
</plugins>
<!-- </pluginManagement> -->
</build>

<dependencies>
<dependency>
<groupId>org.scala-lang</groupId>
<artifactId>scala-library</artifactId>
<version>${scala.version}</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.10</artifactId>
<version>1.6.1</version>
</dependency>
<dependency>
<groupId>org.mongodb.spark</groupId>
<artifactId>mongo-spark-connector_2.10</artifactId>
<version>1.1.0</version>

</dependency>
<dependency>
<groupId>org.mongodb.scala</groupId>
<artifactId>mongo-scala-driver_2.11</artifactId>
<version>1.1.1</version>
</dependency>
</dependencies>


When I run
mvn clean assembly:assembly
, the following error occurs:

C:\Develop\workspace\SparkApplication>mvn clean assembly:assembly
[INFO] Scanning for projects...
[INFO]
[INFO] ------------------------------------------------------------------------
[INFO] Building SparkApplication 0.0.1-SNAPSHOT
[INFO] ------------------------------------------------------------------------
[INFO]
[INFO] --- maven-clean-plugin:2.5:clean (default-clean) @ SparkApplication ---
[INFO] Deleting C:\Develop\workspace\SparkApplication\target
[INFO]
[INFO] ------------------------------------------------------------------------
[INFO] Building SparkApplication 0.0.1-SNAPSHOT
[INFO] ------------------------------------------------------------------------
[INFO]
[INFO] >>> maven-assembly-plugin:2.4.1:assembly (default-cli) > package @ SparkA
pplication >>>
[INFO]
[INFO] --- maven-resources-plugin:2.6:resources (default-resources) @ SparkAppli
cation ---
[INFO] Using 'UTF-8' encoding to copy filtered resources.
[INFO] skip non existing resourceDirectory C:\Develop\workspace\SparkApplication
\src\main\resources
[INFO]
[INFO] --- maven-compiler-plugin:3.1:compile (default-compile) @ SparkApplicatio
n ---
[INFO] Nothing to compile - all classes are up to date
[INFO]
[INFO] --- scala-maven-plugin:3.1.3:compile (default) @ SparkApplication ---
[WARNING] Expected all dependencies to require Scala version: 2.10.5
[WARNING] xx.xxx.xxx:SparkApplication:0.0.1-SNAPSHOT requires scala version:
2.10.5
[WARNING] com.twitter:chill_2.10:0.5.0 requires scala version: 2.10.4
[WARNING] Multiple versions of scala libraries detected!
[INFO] C:\Develop\workspace\SparkApplication\src\main\scala:-1: info: compiling
[INFO] Compiling 1 source files to C:\Develop\workspace\SparkApplication\target\
classes at 1477993255625
[INFO] No known dependencies. Compiling everything
[ERROR] error: bad symbolic reference. A signature in package.class refers to ty
pe compileTimeOnly
[INFO] in package scala.annotation which is not available.
[INFO] It may be completely missing from the current classpath, or the version o
n
[INFO] the classpath might be incompatible with the version used when compiling
package.class.
[ERROR] C:\Develop\workspace\SparkApplication\src\main\scala\com\examples\MainEx
ample.scala:33: error: Reference to method intWrapper in class LowPriorityImplic
its should not have survived past type checking,
[ERROR] it should have been processed and eliminated during expansion of an encl
osing macro.
[ERROR] val count = sc.parallelize(1 to NUM_SAMPLES).map{i =>
[ERROR] ^
[ERROR] two errors found
[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 10.363 s
[INFO] Finished at: 2016-11-01T10:40:58+01:00
[INFO] Final Memory: 20M/353M
[INFO] ------------------------------------------------------------------------
[ERROR] Failed to execute goal net.alchim31.maven:scala-maven-plugin:3.1.3:compi
le (default) on project SparkApplication: wrap: org.apache.commons.exec.ExecuteE
xception: Process exited with an error: 1(Exit value: 1) -> [Help 1]
[ERROR]
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e swit
ch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR]
[ERROR] For more information about the errors and possible solutions, please rea
d the following articles:
[ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionE
xception


The error occurs only when adding the
mongo-scala-driver_2.11
dependency. Without this dependency, the jar will be built. My code is currently the Pi-Estimation example from the Spark website:

val conf = new SparkConf()
.setAppName("Cluster Application")
//.set("spark.serializer", "org.apache.spark.serializer.KryoSerializer")

val sc = new SparkContext(conf)


val count = sc.parallelize(1 to NUM_SAMPLES).map{i =>
val x = Math.random()
val y = Math.random()
if (x*x + y*y < 1) 1 else 0
}.reduce(_ + _)
println("Pi is roughly " + 4.0 * count / NUM_SAMPLES)


I also tried adding the following tags to each element as I found this in some github issue. Did not help though.

<exclusions>
<exclusion>
<!-- make sure wrong scala version is not pulled in -->
<groupId>org.scala-lang</groupId>
<artifactId>scala-library</artifactId>
</exclusion>
</exclusions>


How to fix this? The MongoDB Scala Driver seems to be built against Scala 2.11 but Spark requires Scala 2.10.

Answer

Remove the Mongo Scala Driver dependency, its not compiled for Scala 2.10 and therefore not compatible.

The good news is MongoDB Spark Connector is a standalone connector. It utilises the synchronous Mongo Java Driver because Spark is designed for CPU intensive synchronous tasks. It has been designed to follow Spark idioms and is all that is needed to connect MongoDB to Spark.

On the other hand the Mongo Scala Driver is idiomatic to modern Scala conventions; all IO is fully asynchronous. This is great for web applications and improving the scalability of an individual machine.