Arjun Mukherjee Arjun Mukherjee - 10 days ago 4
Scala Question

Elastic BeanStalk Docker app, writing local files

I have a play scala application that I am deploying as a generic docker application via the AWS Elasticbeanstalk console. When I run the application locally, I don't see any issues, so I would think the code is right.
I need to have the ability to


  • Write files to local disk

  • Run some command line utilities (like ffmpeg) on those files



However, my application does not write to the local disk, nor does it let me execute command line utilities.

val localFile = new File(s"$localFilePath/$siteId/download/${fileName}.raw")
s3Client.getObject(new GetObjectRequest(bucketName, summary.getKey), localFile)
val cmd = s"ffmpeg -i ${localFile.getAbsolutePath} -vcodec copy ${localFile.getAbsolutePath}.mp4"
cmd !;


This is my Dockerrun.aws.json file

{"AWSEBDockerrunVersion": "1","Logging": "/opt/docker/logs"}


UPDATE
Updated my Dockerrun.aws.json file to include the volume mapping

{"AWSEBDockerrunVersion": "1","Logging": "/opt/docker/logs","Volumes":[{"HostDirectory": "/tmp/files","ContainerDirectory": "/tmp/files"}]}


The app is now writing local files, but unable to run ffmpeg for some reason.

Exception : Cannot run program "ffmpeg": error=2, No such file or directory

Answer

TL;DR - ffmpeg can't be found by your application because you installed it on the host operating system while your code runs in the container operating system. To fix that, install ffmpeg in your container by writing a custom Dockerfile.

Software Containers

Now, in order to better understand your problem you must understand that software containers are a special kind of virtualization. That is, the operating system inside the container is completely separate from the host operating system and from other containers on the same machine, be it your laptop or your EC2 instance. Containers may share some information with the host OS or with other containers on the same machine, however you need to do explicit operations in order to achieve that (i.e. Docker volumes).

Docker containers are a specific type of software containers. Some basic information regarding Docker and containers in general can be found in the official What is Docker page.

Containers vs VMs

Containers are a little similar to virtual machines, however in my opinion there are more differences than similarities between the two. They are similar in the sense that they allow you to run multiple applications on the same hardware while each application gets its own, separate, virtual environment. They are different from VMs in the kind of virtualization - while an application on a VM shares the same hardware with other applications on the same host, an application inside a container shares both the hardware and the operating system of the host.

Code in a Container is Isolated!

Docker containers provide a virtually-separate environment for every application by leveraging specific capabilities of the Linux operating system. Each container lives in a virtually-isolated environment, which makes it feel as if it has its own filesystem, networking, process IDs etc. This means that anything that happens inside a container does not affect the host, and vice versa.

Therefore, when working with Docker containers you normally don't do much on the host. For example, in your case it would be wrong to install ffmpeg on the host and then try to use it from within the container. Even though it's possible, it defeats the whole purpose of using Docker. The correct approach would be to have all your dependencies installed inside the container. Not only will it solve your No such file problem, it will also allow you to run your container anywhere you like (AWS, GCP, your laptop...) and will always work exactly the same. This kind of portability is unique to containers and is one of the main reasons people use containers as it allows you to have a consistent behavior of your code since it always runs in exactly the same environment.

Pack Your Environment With the Code

Another way of looking at it is - when using Docker, you pack your execution environment with the code. This is done in a very efficient and compact way - using a Dockerfile. You simply include a Dockerfile in the root directory where your code is. The Dockerfile contains all the instructions which are required for creating the environment your code needs in order to run. In the Dockerfile you install dependencies, copy files, change permissions and do many other things. Then, you build an image from which you can launch containers.

Using Dockerfiles also allows you to version control your dependencies and provides a nice replacement to configuration management tools like Chef, Ansible and Puppet.

I suggest you take a look at the official training videos on the Docker website. They will give you a better understanding of what Docker is and how to work with it.

Conclusion

To conclude, if you simply want to run your code on Elastic Beanstalk, you don't have to use Docker. You can probably get it to run on Beanstalk's Java environment (though I've never done that myself). However, if you want to Dockerize your application, you would first need to have a better understanding of Docker since working with containers requires a shift in perspective that is not always easy to do.