Zubair Zubair - 6 months ago 59
Python Question

Extracting noun phrases from NLTK using python

I am new to both python and nltk. I have converted the code from https://gist.github.com/alexbowe/879414 to the below given code to make it run for many documents/text chunks. But I got the following error

Traceback (most recent call last):
File "E:/NLP/PythonProgrames/NPExtractor/AdvanceMain.py", line 16, in <module>
result = np_extractor.extract()
File "E:\NLP\PythonProgrames\NPExtractor\NPExtractorAdvanced.py", line 67, in extract
for term in terms:
File "E:\NLP\PythonProgrames\NPExtractor\NPExtractorAdvanced.py", line 60, in get_terms
for leaf in self.leaves(tree):
TypeError: leaves() takes 1 positional argument but 2 were given

Can any one help me to fix this problem. I have to extract noun phrases from millions of product reviews. I used Standford NLP kit using Java, but it was extremely slow, so I thought using nltk in python will be better. Please also recommend if there is any better solution.

import nltk
from nltk.corpus import stopwords
stopwords = stopwords.words('english')
grammar = r"""
{<NN.*|JJ>*<NN.*>} # Nouns and Adjectives, terminated with Nouns
{<NBAR><IN><NBAR>} # Above, connected with in/of/etc...
lemmatizer = nltk.WordNetLemmatizer()
stemmer = nltk.stem.porter.PorterStemmer()

class NounPhraseExtractor(object):

def __init__(self, sentence):
self.sentence = sentence

def execute(self):
# Taken from Su Nam Kim Paper...
chunker = nltk.RegexpParser(grammar)
#toks = nltk.regexp_tokenize(text, sentence_re)
# #postoks = nltk.tag.pos_tag(toks)
toks = nltk.word_tokenize(self.sentence)
postoks = nltk.tag.pos_tag(toks)
tree = chunker.parse(postoks)
return tree

def leaves(tree):
"""Finds NP (nounphrase) leaf nodes of a chunk tree."""
for subtree in tree.subtrees(filter=lambda t: t.label() == 'NP'):
yield subtree.leaves()

def normalise(word):
"""Normalises words to lowercase and stems and lemmatizes it."""
word = word.lower()
word = stemmer.stem_word(word)
word = lemmatizer.lemmatize(word)
return word

def acceptable_word(word):
"""Checks conditions for acceptable word: length, stopword."""
accepted = bool(2 <= len(word) <= 40
and word.lower() not in stopwords)
return accepted

def get_terms(self,tree):
for leaf in self.leaves(tree):
term = [self.normalise(w) for w, t in leaf if self.acceptable_word(w)]
yield term

def extract(self):
terms = self.get_terms(self.execute())
matches = []
for term in terms:
for word in term:
return matches


You need to either:

  • decorate each of normalize, acceptable_word, and leaves with @staticmethod, or
  • add a self parameter as the first parameter of these methods.

You're calling self.leaves which will pass self as an implicit first parameter to the leaves method (but your method only takes a single parameter). Making these static methods, or adding a self parameter will fix this issue.

(your later calls to self.acceptable_word,and self.normalize will have the same issue)

You could read about Python's static methods in their docs, or possibly from an external site that may be easier to digest.