so I never worked with Spark or Docker, but I have to use it for a project.
I'm trying to understand how this works. So I build a fat jar in Eclipse and was hoping to submit it to my docker container, which I set up using this guide: https://github.com/sequenceiq/docker-spark
Now, I don't really understand how I get my jar from my local system to my docker container and then run it.
I think I'm missing how this all really works together, but maybe someone can clear it up.
I would be very thankful!
As I know there are two posibilities:
1,extend sequenceiq image and create your own docker image*
I think the best way is to "extend" sequenceiq spark docker image and
COPY your spark application in the build phase of image.
so your Dockerfile should looks something like:
FROM sequenceiq/spark:1.6.0 COPY sparkapplication.jar sparkapplication.jar COPY bootstrap.sh /etc/bootstrap.sh ENTRYPOINT ["/etc/bootstrap.sh"]
and you need to create/edit bootstrap.sh (entrypoint) with
You can use Their bootstrap.sh and add here your spark submitting command (somewhere in the end of this file) something like:
$SPAKR_HOME/bin/spark-submit \ --class MainClass \ --master local[*] \ /sparkapplication.jar
Just put your
bootstrap.sh next to
Dockerfile (in the same folder).
2, manually copy spark application to running container
The second option is to use their docker container and copy application to container:
docker cp sparkapplication.jar container:/sparkapplication.jar
exec (attach) to container (
docker exec -it container bash) and execute