Robbert Robbert - 2 years ago 194
Python Question

tuple has no attribute 'isdigit'

I need to do some word processing using NLTK module and I get this error:
AttributeError: 'tuple' object has no attribute 'isdigit'

Does anybody know how to deal with this error?

Traceback (most recent call last):
File "", line 36, in <module>
postoks = nltk.tag.pos_tag(tok)
NameError: name 'tok' is not defined

PS C:\Users\moham\Desktop\Presentation> python
Traceback (most recent call last):
File "", line 37, in <module>
postoks = nltk.tag.pos_tag(tok)
File "c:\python34\lib\site-packages\nltk-3.1-py3.4.egg\nltk\tag\", line 111, in pos_tag
return _pos_tag(tokens, tagset, tagger)
File "c:\python34\lib\site-packages\nltk-3.1-py3.4.egg\nltk\tag\", line 82, in _pos_tag
tagged_tokens = tagger.tag(tokens)
File "c:\python34\lib\site-packages\nltk-3.1-py3.4.egg\nltk\tag\", line 153, in tag
context = self.START + [self.normalize(w) for w in tokens] + self.END
File "c:\python34\lib\site-packages\nltk-3.1-py3.4.egg\nltk\tag\", line 153, in <listcomp>
context = self.START + [self.normalize(w) for w in tokens] + self.END
File "c:\python34\lib\site-packages\nltk-3.1-py3.4.egg\nltk\tag\", line 228, in normalize
elif word.isdigit() and len(word) == 4:
AttributeError: 'tuple' object has no attribute 'isdigit'

import nltk

with open ("SHORT-LIST.txt", "r",encoding='utf8') as myfile:
text = ('\n', ''))

#text = "program managment is complicated issue for human workers"

# Used when tokenizing words
sentence_re = r'''(?x) # set flag to allow verbose regexps
([A-Z])(\.[A-Z])+\.? # abbreviations, e.g. U.S.A.
| \w+(-\w+)* # words with optional internal hyphens
| \$?\d+(\.\d+)?%? # currency and percentages, e.g. $12.40, 82%
| \.\.\. # ellipsis
| [][.,;"'?():-_`] # these are separate tokens

lemmatizer = nltk.WordNetLemmatizer()
stemmer = nltk.stem.porter.PorterStemmer()

grammar = r"""
{<NN.*|JJ>*<NN.*>} # Nouns and Adjectives, terminated with Nouns

{<NBAR><IN><NBAR>} # Above, connected with in/of/etc...
chunker = nltk.RegexpParser(grammar)

tok = nltk.regexp_tokenize(text, sentence_re)

postoks = nltk.tag.pos_tag(tok)

#print (postoks)

tree = chunker.parse(postoks)

from nltk.corpus import stopwords
stopwords = stopwords.words('english')

def leaves(tree):
"""Finds NP (nounphrase) leaf nodes of a chunk tree."""
for subtree in tree.subtrees(filter = lambda t: t.label()=='NP'):
yield subtree.leaves()

def normalise(word):
"""Normalises words to lowercase and stems and lemmatizes it."""
word = word.lower()
word = stemmer.stem_word(word)
word = lemmatizer.lemmatize(word)
return word

def acceptable_word(word):
"""Checks conditions for acceptable word: length, stopword."""
accepted = bool(2 <= len(word) <= 40
and word.lower() not in stopwords)
return accepted

def get_terms(tree):
for leaf in leaves(tree):
term = [ normalise(w) for w,t in leaf if acceptable_word(w) ]
yield term

terms = get_terms(tree)

with open("results.txt", "w+") as logfile:
for term in terms:
for word in term:
result = word
logfile.write("%s\n" % str(word))
# print (word),
# (print)


Answer Source

The default tagger is made as Perceptron in the nltk 3.1 version. Which is now the latest version. All my nltk.regexp_tokenize stopped functioning correctly and all my nltk.pos_tag started giving the above error.

The solution that I have currently is to use the previous version nltk 3.0.1 to make them functioning. I am not sure if this is a bug in the current release of nltk.

Installation instruction for nltk 3.0.4 version in ubuntu. From your home directory or any other directory do the following steps.

$ wget
$ tar -xvzf 3.0.4.tar.gz 
$ cd nltk-3.0.4
$ sudo python3.4 install
Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download