raptor96 raptor96 - 3 months ago 19
Python Question

Converting output of dependency parsing to tree

I am using

Stanford dependency parser
and the I get the following output of the sentence


I shot an elephant in my sleep


python dep_parsing.py
[((u'shot', u'VBD'), u'nsubj', (u'I', u'PRP')), ((u'shot', u'VBD'), u'dobj', (u'elephant', u'NN')), ((u'elephant', u'NN'), u'det', (u'an', u'DT')), ((u'shot', u'VBD'), u'nmod', (u'sleep', u'NN')), ((u'sleep', u'NN'), u'case', (u'in', u'IN')), ((u'sleep', u'NN'), u'nmod:poss', (u'my', u'PRP$'))]


I want to convert this into a graph with nodes being each token and edges being the relation between them.

I need the graph structure for further processing hence it would help if modification to it are easy and also must be easily representable.

Here is my code till now.

from nltk.parse.stanford import StanfordDependencyParser
stanford_parser_dir = 'stanford-parser/'
eng_model_path = stanford_parser_dir + "stanford-parser-models/edu/stanford/nlp/models/lexparser/englishRNN.ser.gz"
my_path_to_models_jar = stanford_parser_dir + "stanford-parser-3.5.2-models.jar"
my_path_to_jar = stanford_parser_dir + "stanford-parser.jar"

dependency_parser = StanfordDependencyParser(path_to_jar=my_path_to_jar, path_to_models_jar=my_path_to_models_jar)

result = dependency_parser.raw_parse('I shot an elephant in my sleep')
dep = result.next()
a = list(dep.triples())
print a


How can I make such a graph structure?

Answer

You can traverse over dep.triples() and get your desired output.

Code:

for triple in dep.triples():
    print triple[1],"(",triple[0][0],", ",triple[2][0],")"

Output:

nsubj ( shot ,  I )
dobj ( shot ,  elephant )
det ( elephant ,  an )
nmod ( shot ,  sleep )
case ( sleep ,  in )
nmod:poss ( sleep ,  my )

For more information you can check : NLTK Dependencygraph methods triples(), to_dot() and dep.tree().draw()

Edit -

The output of dep.to_dot() is

digraph G{
edge [dir=forward]
node [shape=plaintext]

0 [label="0 (None)"]
0 -> 2 [label="root"]
1 [label="1 (I)"]
2 [label="2 (shot)"]
2 -> 4 [label="dobj"]
2 -> 7 [label="nmod"]
2 -> 1 [label="nsubj"]
3 [label="3 (an)"]
4 [label="4 (elephant)"]
4 -> 3 [label="det"]
5 [label="5 (in)"]
6 [label="6 (my)"]
7 [label="7 (sleep)"]
7 -> 5 [label="case"]
7 -> 6 [label="nmod:poss"]
}