TW1411 TW1411 - 1 year ago 113
Python Question

Docker NLTK Download

I am building a docker container using the following Dockerfile:

FROM ubuntu:14.04

RUN apt-get update

RUN apt-get install -y python python-dev python-pip

ADD . /app

RUN apt-get install -y python-scipy

RUN pip install -r /arrc/requirements.txt

EXPOSE 5000

WORKDIR /app

CMD python app.py


Everything goes well until I run the image and get the following error:

**********************************************************************
Resource u'tokenizers/punkt/english.pickle' not found. Please
use the NLTK Downloader to obtain the resource: >>>
nltk.download()
Searched in:
- '/root/nltk_data'
- '/usr/share/nltk_data'
- '/usr/local/share/nltk_data'
- '/usr/lib/nltk_data'
- '/usr/local/lib/nltk_data'
- u''
**********************************************************************


I have had this problem before and it is discussed here however I am not sure how to approach it using Docker. I have tried:

CMD python
CMD import nltk
CMD nltk.download()


as well as:

CMD python -m nltk.downloader -d /usr/share/nltk_data popular


But am still getting the error.

Answer Source

In your Dockerfile, try adding instead:

RUN python -m nltk.downloader punkt

This will run the command and install the requested files to //nltk_data/

The problem is most likely related to using CMD vs. RUN in the Dockerfile. Documentation for CMD:

The main purpose of a CMD is to provide defaults for an executing container.

which is used during docker run <image>, not during build. So other CMD lines probably were overwritten by the last CMD python app.py line.

Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download