null null - 4 months ago 45
Scala Question

Unable to run a jar or sparkApplication on aws EMR

I have a very simple app that I'm trying to run on aws emr. The jar has been built using assembly with spark a provided dependency. It resides on S3 along with a test text file that I wanted to test.

In the EMR UI I select to add a step and add the details telling it the location of the jar and the argument file location.

It runs but always fails with an error - I then set up a new cluster(sanity checking) and rna again only to get the same result, any help is appreciated.

Thank you

The error from the log:

16/03/18 11:40:56 INFO client.RMProxy: Connecting to ResourceManager at ip-10-1-1-234.ec2.internal/10.1.1.234:8032
16/03/18 11:40:56 INFO yarn.Client: Requesting a new application from cluster with 1 NodeManagers
16/03/18 11:40:56 INFO yarn.Client: Verifying our application has not requested more than the maximum memory capability of the cluster (11520 MB per container)
16/03/18 11:40:56 INFO yarn.Client: Will allocate AM container, with 1408 MB memory including 384 MB overhead
16/03/18 11:40:56 INFO yarn.Client: Setting up container launch context for our AM
16/03/18 11:40:56 INFO yarn.Client: Setting up the launch environment for our AM container
16/03/18 11:40:56 INFO yarn.Client: Preparing resources for our AM container
16/03/18 11:40:57 INFO yarn.Client: Uploading resource file:/usr/lib/spark/lib/spark-assembly-1.6.0-hadoop2.7.1-amzn-1.jar -> hdfs://ip-10-1-1-234.ec2.internal:8020/user/hadoop/.sparkStaging/application_1458297951763_0003/spark-assembly-1.6.0-hadoop2.7.1-amzn-1.jar
16/03/18 11:40:57 INFO metrics.MetricsSaver: MetricsConfigRecord disabledInCluster: false instanceEngineCycleSec: 60 clusterEngineCycleSec: 60 disableClusterEngine: false maxMemoryMb: 3072 maxInstanceCount: 500 lastModified: 1458297958626
16/03/18 11:40:57 INFO metrics.MetricsSaver: Created MetricsSaver j-DKMA93DFZ456:i-91bff215:SparkSubmit:20036 period:60 /mnt/var/em/raw/i-91bff215_20160318_SparkSubmit_20036_raw.bin
16/03/18 11:40:58 INFO metrics.MetricsSaver: 1 aggregated HDFSWriteDelay 590 raw values into 1 aggregated values, total 1
16/03/18 11:40:59 INFO fs.EmrFileSystem: Consistency disabled, using com.amazon.ws.emr.hadoop.fs.s3n.S3NativeFileSystem as filesystem implementation


16/03/18 11:41:00 INFO metrics.MetricsSaver: Thread 1 created MetricsLockFreeSaver 1
16/03/18 11:41:00 INFO yarn.Client: Uploading resource file:/mnt/tmp/spark-030f9d29-f7ca-42fa-9caf-64ea103a2bb1/__spark_conf__7615049662154628286.zip -> hdfs://ip-10-1-1-234.ec2.internal:8020/user/hadoop/.sparkStaging/application_1458297951763_0003/__spark_conf__7615049662154628286.zip
16/03/18 11:41:00 INFO spark.SecurityManager: Changing view acls to: hadoop
16/03/18 11:41:00 INFO spark.SecurityManager: Changing modify acls to: hadoop
16/03/18 11:41:00 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(hadoop); users with modify permissions: Set(hadoop)
16/03/18 11:41:01 INFO yarn.Client: Submitting application 3 to ResourceManager
16/03/18 11:41:01 INFO impl.YarnClientImpl: Submitted application application_1458297951763_0003
16/03/18 11:41:02 INFO yarn.Client: Application report for application_1458297951763_0003 (state: ACCEPTED)
16/03/18 11:41:02 INFO yarn.Client:
client token: N/A
diagnostics: N/A
ApplicationMaster host: N/A
ApplicationMaster RPC port: -1
queue: default
start time: 1458301261052
final status: UNDEFINED
tracking URL: http://ip-10-1-1-234.ec2.internal:20888/proxy/application_1458297951763_0003/
user: hadoop
16/03/18 11:41:03 INFO yarn.Client: Application report for application_1458297951763_0003 (state: ACCEPTED)
16/03/18 11:41:04 INFO yarn.Client: Application report for application_1458297951763_0003 (state: ACCEPTED)
16/03/18 11:41:05 INFO yarn.Client: Application report for application_1458297951763_0003 (state: ACCEPTED)
16/03/18 11:41:06 INFO yarn.Client: Application report for application_1458297951763_0003 (state: ACCEPTED)
16/03/18 11:41:07 INFO yarn.Client: Application report for application_1458297951763_0003 (state: ACCEPTED)
16/03/18 11:41:08 INFO yarn.Client: Application report for application_1458297951763_0003 (state: ACCEPTED)
16/03/18 11:41:09 INFO yarn.Client: Application report for application_1458297951763_0003 (state: ACCEPTED)
16/03/18 11:41:10 INFO yarn.Client: Application report for application_1458297951763_0003 (state: ACCEPTED)
16/03/18 11:41:11 INFO yarn.Client: Application report for application_1458297951763_0003 (state: ACCEPTED)
16/03/18 11:41:12 INFO yarn.Client: Application report for application_1458297951763_0003 (state: ACCEPTED)
16/03/18 11:41:13 INFO yarn.Client: Application report for application_1458297951763_0003 (state: ACCEPTED)
16/03/18 11:41:14 INFO yarn.Client: Application report for application_1458297951763_0003 (state: ACCEPTED)
16/03/18 11:41:15 INFO yarn.Client: Application report for application_1458297951763_0003 (state: ACCEPTED)
16/03/18 11:41:16 INFO yarn.Client: Application report for application_1458297951763_0003 (state: ACCEPTED)
16/03/18 11:41:17 INFO yarn.Client: Application report for application_1458297951763_0003 (state: ACCEPTED)
16/03/18 11:41:18 INFO yarn.Client: Application report for application_1458297951763_0003 (state: ACCEPTED)
16/03/18 11:41:19 INFO yarn.Client: Application report for application_1458297951763_0003 (state: ACCEPTED)
16/03/18 11:41:20 INFO yarn.Client: Application report for application_1458297951763_0003 (state: ACCEPTED)
16/03/18 11:41:21 INFO yarn.Client: Application report for application_1458297951763_0003 (state: ACCEPTED)
16/03/18 11:41:22 INFO yarn.Client: Application report for application_1458297951763_0003 (state: ACCEPTED)
16/03/18 11:41:23 INFO yarn.Client: Application report for application_1458297951763_0003 (state: ACCEPTED)
16/03/18 11:41:24 INFO yarn.Client: Application report for application_1458297951763_0003 (state: ACCEPTED)
16/03/18 11:41:25 INFO yarn.Client: Application report for application_1458297951763_0003 (state: ACCEPTED)
16/03/18 11:41:26 INFO yarn.Client: Application report for application_1458297951763_0003 (state: ACCEPTED)
16/03/18 11:41:27 INFO yarn.Client: Application report for application_1458297951763_0003 (state: ACCEPTED)
16/03/18 11:41:28 INFO yarn.Client: Application report for application_1458297951763_0003 (state: ACCEPTED)
16/03/18 11:41:29 INFO yarn.Client: Application report for application_1458297951763_0003 (state: ACCEPTED)
16/03/18 11:41:30 INFO yarn.Client: Application report for application_1458297951763_0003 (state: ACCEPTED)
16/03/18 11:41:31 INFO yarn.Client: Application report for application_1458297951763_0003 (state: ACCEPTED)
16/03/18 11:41:32 INFO yarn.Client: Application report for application_1458297951763_0003 (state: FAILED)
16/03/18 11:41:32 INFO yarn.Client:
client token: N/A
diagnostics: Application application_1458297951763_0003 failed 2 times due to AM Container for appattempt_1458297951763_0003_000002 exited with exitCode: 15
For more detailed output, check application tracking page:http://ip-10-1-1-234.ec2.internal:8088/cluster/app/application_1458297951763_0003Then, click on links to logs of each attempt.
Diagnostics: Exception from container-launch.
Container id: container_1458297951763_0003_02_000001
Exit code: 15
Stack trace: ExitCodeException exitCode=15:
at org.apache.hadoop.util.Shell.runCommand(Shell.java:545)
at org.apache.hadoop.util.Shell.run(Shell.java:456)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:722)
at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:211)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)


Container exited with a non-zero exit code 15
Failing this attempt. Failing the application.
ApplicationMaster host: N/A
ApplicationMaster RPC port: -1
queue: default
start time: 1458301261052
final status: FAILED
tracking URL: http://ip-10-1-1-234.ec2.internal:8088/cluster/app/application_1458297951763_0003
user: hadoop
Exception in thread "main" org.apache.spark.SparkException: Application application_1458297951763_0003 finished with failed status
at org.apache.spark.deploy.yarn.Client.run(Client.scala:1029)
at org.apache.spark.deploy.yarn.Client$.main(Client.scala:1076)
at org.apache.spark.deploy.yarn.Client.main(Client.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
16/03/18 11:41:32 INFO util.ShutdownHookManager: Shutdown hook called
16/03/18 11:41:32 INFO util.ShutdownHookManager: Deleting directory /mnt/tmp/spark-030f9d29-f7ca-42fa-9caf-64ea103a2bb1
Command exiting with ret '1'

Answer

Referring to issue Running Spark Job on Yarn Cluster

It can mean a lot of things, for us, we get the similar error message because of unsupported Java class version, and we fixed the problem by deleting the referenced Java class in our project.

Use this command to see the detailed error message:

yarn logs -application_id application_1458297951763_0003