so I never worked with Spark or Docker, but I have to use it for a project.
I'm trying to understand how this works. So I build a fat jar in Eclipse and was hoping to submit it to my docker container, which I set up using this guide: https://github.com/sequenceiq/docker-spark
Now, I don't really understand how I get my jar from my local system to my docker container and then run it.
I think I'm missing how this all really works together, but maybe someone can clear it up.
I would be very thankful!
As I know there are two posibilities:
1,extend sequenceiq image and create your own docker image*
I think the best way is to "extend" sequenceiq spark docker image and COPY
your spark application in the build phase of image.
so your Dockerfile should looks something like:
FROM sequenceiq/spark:1.6.0
COPY sparkapplication.jar sparkapplication.jar
COPY bootstrap.sh /etc/bootstrap.sh
ENTRYPOINT ["/etc/bootstrap.sh"]
and you need to create/edit bootstrap.sh (entrypoint) with spark-submit
command:
You can use Their bootstrap.sh and add here your spark submitting command (somewhere in the end of this file) something like:
$SPAKR_HOME/bin/spark-submit \
--class MainClass \
--master local[*] \
/sparkapplication.jar
Just put your sparkapplication.jar
nad bootstrap.sh
next to Dockerfile
(in the same folder).
2, manually copy spark application to running container
The second option is to use their docker container and copy application to container:
docker cp sparkapplication.jar container:/sparkapplication.jar
then exec
(attach) to container (docker exec -it container bash
) and execute spark-submit
manually.