Submitting a jar to sequenceiq docker-spark container

so I never worked with Spark or Docker, but I have to use it for a project.
I'm trying to understand how this works. So I build a fat jar in Eclipse and was hoping to submit it to my docker container, which I set up using this guide: https://github.com/sequenceiq/docker-spark

Now, I don't really understand how I get my jar from my local system to my docker container and then run it.

I think I'm missing how this all really works together, but maybe someone can clear it up.

I would be very thankful!

Answer Source

As I know there are two posibilities:

1,extend sequenceiq image and create your own docker image*

I think the best way is to "extend" sequenceiq spark docker image and COPY your spark application in the build phase of image.

so your Dockerfile should looks something like:

FROM sequenceiq/spark:1.6.0
COPY sparkapplication.jar sparkapplication.jar
COPY bootstrap.sh /etc/bootstrap.sh
ENTRYPOINT ["/etc/bootstrap.sh"]

and you need to create/edit bootstrap.sh (entrypoint) with spark-submit command:

You can use Their bootstrap.sh and add here your spark submitting command (somewhere in the end of this file) something like:

$SPAKR_HOME/bin/spark-submit \
  --class MainClass \
  --master local[*] \

Just put your sparkapplication.jar nad bootstrap.sh next to Dockerfile (in the same folder).

2, manually copy spark application to running container

The second option is to use their docker container and copy application to container:

docker cp sparkapplication.jar container:/sparkapplication.jar

then exec (attach) to container (docker exec -it container bash) and execute spark-submit manually.

