As a part of my academic project I need to parse a bunch of arbitrary sentences into a dependency graph. After a searching a lot I got the solution that I can use Malt Parser for parsing text with its pre trained grammer.
I have downloaded pre-trained model (engmalt.linear-1.7.mco) from http://www.maltparser.org/mco/mco.html. BUt I don't know how to parse my sentences using this grammer file and malt parser (by the python interface for malt). I have downloaded latest version of malt parser (1.7.2) and moved it to '/usr/lib/'
txt="This is a test sentence"
Traceback (most recent call last):
File "<pyshell#7>", line 1, in <module>
File "/usr/local/lib/python2.7/dist-packages/nltk-2.0b5-py2.7.egg/nltk/parse/malt.py", line 88, in raw_parse
return self.parse(words, verbose)
File "/usr/local/lib/python2.7/dist-packages/nltk-2.0b5-py2.7.egg/nltk/parse/malt.py", line 75, in parse
return self.tagged_parse(taggedwords, verbose)
File "/usr/local/lib/python2.7/dist-packages/nltk-2.0b5-py2.7.egg/nltk/parse/malt.py", line 122, in tagged_parse
File "/usr/local/lib/python2.7/dist-packages/nltk-2.0b5-py2.7.egg/nltk/parse/dependencygraph.py", line 121, in load
IOError: [Errno 2] No such file or directory: '/tmp/malt_output.conll'
Note that is answer is no longer working because of the updated version of the MaltParser API in NLTK since August 2015. This answer is kept for legacy sake.
Please see this answers to get MaltParser working with NLTK:
Disclaimer: This is not an eternal solutions. The answer in the above link (posted on Feb 2016) will work for now. But when MaltParser or NLTK API changes, it might also change the syntax to using MaltParser in NLTK.
A couple problems with your setup:
train_from_filemust be a file in CoNLL format, not a pre-trained model. For an
mcofile, you pass it to the
MaltParserconstructor using the
mcofile, so you'll have to tell java to use more heap space with the
-Xmxparameter. Unfortunately this wasn't possible with the existing code so I just checked in a change to allow an additional constructor parameters for java args. See here.
So here's what you need to do:
First, get the latest NLTK revision:
git clone https://github.com/nltk/nltk.git
(NOTE: If you can't use the git version of NLTK, then you'll have to update the file
malt.py manually or copy it from here to have your own version.)
Second, rename the jar file to
malt.jar, which is what NLTK expects:
cd /usr/lib/ ln -s maltparser-1.7.2.jar malt.jar
Then add an environment variable pointing to malt parser:
Finally, load and use malt parser in python:
>>> import nltk >>> parser = nltk.parse.malt.MaltParser(working_dir="/home/rohith/malt-1.7.2", ... mco="engmalt.linear-1.7", ... additional_java_args=['-Xmx512m']) >>> txt = "This is a test sentence" >>> graph = parser.raw_parse(txt) >>> graph.tree().pprint() '(This (sentence is a test))'