user2870222 user2870222 - 3 months ago 10
Python Question

Why isn't working 'chunk_types' parameter in ConllChunkCorpusReader?

In the NLTK's ConllChunkCorpusReader class, there is a parameter

chunk_types
.
I expected it would return the relative chunks from given text, but I don't know what exactly this
chunk_types
is for.

text = '''
Mr. NNP B-NP
Meador NNP I-NP
had VBD B-VP
been VBN I-VP
executive JJ B-NP
vice NN I-NP
president NN I-NP
of IN B-PP
Balcor NNP B-NP
. . O'''


After loading a
ConllChunkCorpusReader
as reader, I get results like below.

>>> reader.chunked_sents(chunk_types='NP')
[Tree('S', [Tree('NP', [('Mr.', 'NNP'), ('Meador', 'NNP')]), ('had', 'VBD'),
('been', 'VBN'), Tree('NP', [('executive', 'JJ'), ('vice', 'NN'), ('president', 'NN')]),
('of', 'IN'), Tree('NP', [('Balcor', 'NNP')]), ('.', '.')])]


But I am looking for output with only NP chunks, as below.

>>> reader.chunked_sents(chunk_types='NP')
[Tree('NP', [('Mr.', 'NNP'), ('Meador', 'NNP')]),
Tree('NP', [('executive', 'JJ'), ('vice', 'NN'), ('president', 'NN')]),
Tree('NP', [('Balcor', 'NNP')]), ('.', '.')])]

Answer

A chunked tree is a tree with at most three levels: The root of the tree (the node S), with children that are either lexical items or chunks; and each chunk in turn is a tree of depth 1, with lexical items as children.

If you look carefully, you will see that your input has a VP chunk that has disappeared: The top of the tree is connected directly to the lexical items ('had', 'VBD') and ('been', 'VBN'). That's what chunk_types does.

You can visualize the tree returned by the reader by printing it or calling its draw() method:

>>> trees = reader.chunked_sents(chunk_types='NP')
>>> print(t[0])
(S
  (NP Mr./NNP Meador/NNP)
  had/VBD
  been/VBN
  (NP executive/JJ vice/NN president/NN)
  of/IN
  (NP Balcor/NNP)
  ./.)
Comments