Zubair Zubair - 1 year ago 292
Python Question

Extracting noun phrases from NLTK using python

I am new to both python and nltk. I have converted the code from https://gist.github.com/alexbowe/879414 to the below given code to make it run for many documents/text chunks. But I got the following error

Traceback (most recent call last):
File "E:/NLP/PythonProgrames/NPExtractor/AdvanceMain.py", line 16, in <module>
result = np_extractor.extract()
File "E:\NLP\PythonProgrames\NPExtractor\NPExtractorAdvanced.py", line 67, in extract
for term in terms:
File "E:\NLP\PythonProgrames\NPExtractor\NPExtractorAdvanced.py", line 60, in get_terms
for leaf in self.leaves(tree):
TypeError: leaves() takes 1 positional argument but 2 were given

Can any one help me to fix this problem. I have to extract noun phrases from millions of product reviews. I used Standford NLP kit using Java, but it was extremely slow, so I thought using nltk in python will be better. Please also recommend if there is any better solution.

import nltk
from nltk.corpus import stopwords
stopwords = stopwords.words('english')
grammar = r"""
{<NN.*|JJ>*<NN.*>} # Nouns and Adjectives, terminated with Nouns
{<NBAR><IN><NBAR>} # Above, connected with in/of/etc...
lemmatizer = nltk.WordNetLemmatizer()
stemmer = nltk.stem.porter.PorterStemmer()

class NounPhraseExtractor(object):

def __init__(self, sentence):
self.sentence = sentence

def execute(self):
# Taken from Su Nam Kim Paper...
chunker = nltk.RegexpParser(grammar)
#toks = nltk.regexp_tokenize(text, sentence_re)
# #postoks = nltk.tag.pos_tag(toks)
toks = nltk.word_tokenize(self.sentence)
postoks = nltk.tag.pos_tag(toks)
tree = chunker.parse(postoks)
return tree

def leaves(tree):
"""Finds NP (nounphrase) leaf nodes of a chunk tree."""
for subtree in tree.subtrees(filter=lambda t: t.label() == 'NP'):
yield subtree.leaves()

def normalise(word):
"""Normalises words to lowercase and stems and lemmatizes it."""
word = word.lower()
word = stemmer.stem_word(word)
word = lemmatizer.lemmatize(word)
return word

def acceptable_word(word):
"""Checks conditions for acceptable word: length, stopword."""
accepted = bool(2 <= len(word) <= 40
and word.lower() not in stopwords)
return accepted

def get_terms(self,tree):
for leaf in self.leaves(tree):
term = [self.normalise(w) for w, t in leaf if self.acceptable_word(w)]
yield term

def extract(self):
terms = self.get_terms(self.execute())
matches = []
for term in terms:
for word in term:
return matches

Answer Source

You need to either:

  • decorate each of normalize, acceptable_word, and leaves with @staticmethod, or
  • add a self parameter as the first parameter of these methods.

You're calling self.leaves which will pass self as an implicit first parameter to the leaves method (but your method only takes a single parameter). Making these static methods, or adding a self parameter will fix this issue.

(your later calls to self.acceptable_word,and self.normalize will have the same issue)

You could read about Python's static methods in their docs, or possibly from an external site that may be easier to digest.

Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download