not_a_robot not_a_robot - 1 year ago 66
Python Question

Traversing/navigating downloaded nltk subpackages?

For a particular script I'm running, I need to have installed from

the following packages:

req_modules = ['punkt', 'stopwords', 'averaged_perceptron_tagger', 'maxent_ne_chunker']

I know I can check whether
is downloaded, like this:

import nltk
import os

if 'stopwords' in os.listdir('corpora')):

For me, since I've used
before, this works. However, I want to be able to programmatically check if the other three modules are installed, eventually using something like:

if not all(m in os.listdir('models')) for m in ['punkt', 'averaged_perceptron_tagger', 'maxent_ne_chunker']:
# download the ones that aren't already downloaded

They are all labeled as modules in the downloader accessed at
. This should be an easy lookup, so I tried something like this to get all downloaded subpackages in one list:

all_downloaded = os.listdir("corpora")) + os.listdir("models"))

But I get the
LookupError: Resource 'models' not found
. How can I search the
tab in
just like I can search
? I assume the naming conventions for finding these resources is the same, as "corpora" is the same name of the tab seen in the downloader below

enter image description here


Taking into account the suggestion below, I tried the code below, but still get an
, even though I have exception-handling. What is going on there?

req_modules = {'from nltk import punkt': 'punkt', 'from nltk.corpus import stopwords': 'stopwords',
'from nltk import pos_tag': 'averaged_perceptron_tagger',
'from nltk import ne_chunk': 'maxent_ne_chunker',
'from nltk.stem.porter import PorterStemmer': 'porter_test'}

for m in req_modules:
print("Trying: %s" % m)
except LookupError or ImportError:
print("Tried: %s. Resource '%s' was not available and is being downloaded.\n" % (m, req_modules[m]))[m])

Edit 2:

I got it to work, nevermind. I changed
from nltk import porter_test
from nltk.stem.porter import PorterStemmer
and things work smoothly!

Answer Source

Looks like you are confusing nltk modules with the files in the nltk_data directory, which the modules use. When you install the nltk, you get all the packages. Various modules and functions require data files which you fetch into nltk_data with the downloader. (Some of them are in the category "Models", which maybe you confuse with "modules"?) To figure out which data file to check for, you could run the corresponding function without an nltk_data folder and inspect the error message. For example:

>>> nltk.ne_chunk("anything")
Traceback (most recent call last):
raise LookupError(resource_not_found)
  not found.  Please use the NLTK Downloader to obtain the 

But if it were me, I would not mess with the data files directly. Instead, just try out the service you want and see if it raises an error:

 except LookupError:"maxent_ne_chunker")
Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download