Suhairi Suhaimin Suhairi Suhaimin - 2 months ago 13
Python Question

How to change NLTK default wordnet language to zsm?

I'm new to NLTK and I'm doing the Python 3 Text Processing with NLTK 3 Cookbook: Chapter 4. I've done "Using WordNet for tagging" and works fine in default language English. I've download Language Bahasa (zsm) to omw and want to try in Bahasa using other datasets. Using the same approach, how can I change the language default from English to zsm now?

Code that I'm using:

class WordNetTagger(SequentialBackoffTagger):

def __init__(self, *args, **kwargs):
SequentialBackoffTagger.__init__(self, *args, **kwargs)

self.wordnet_tag_map = {
'n': 'NN',
's': 'JJ',
'a': 'JJ',
'r': 'RB',
'v': 'VB'
}

def choose_tag(self, tokens, index, history):
word = tokens[index]
fd = FreqDist()

for synset in wordnet.synsets(word):
fd[synset.pos()] += 1

if not fd: return None
return self.wordnet_tag_map.get(fd.max())


Thanks in advance.

Answer

As you seem to have figured out, you don't change the default language; you explicitly specify the language you want, whenever you don't want the default. If you find this onerous, you could wrap the wordnet object in your own custom class that provides its own defaults.

class MyWordNet:
    def __init__(self, wn):
        self._wordnet = wn

    def synsets(self, word, pos=None, lang="zsm"):
        return self._wordnet.synsets(word, pos=pos, lang=lang)

    # and similarly for any other methods you need

Then you initialize a wrapper object, passing it the nltk's wordnet reader object, and later you use this instead of the original:

wn = MyWordNet(wordnet)
...

for synset it wn.synsets(word):
   ...