RAVI RAVI - 5 months ago 39
Python Question

NLTK - Download all nltk data except corpara from command line without Downloader UI

We can download all nltk data using:

> import nltk
> nltk.download('all')

Or specific data using:

> nltk.download('punkt')
> nltk.download('maxent_treebank_pos_tagger')

But I want to download all data except 'corpara' files,
for example - all chunkers, grammers, models, stemmers, taggers, tokenizers, etc

is there any way to do so without Downloader UI? something like,

> nltk.download('all-taggers')


List all corpora ids and set _status_cache[pkg.id] = 'installed'.

It will set status value for all corpora as 'installed' and all corpora packages will skipped when we use download('all').

import nltk

dwlr = nltk.downloader.Downloader()

for pkg in dwlr.corpora():
    dwlr._status_cache[pkg.id] = 'installed'