Danny Delott Danny Delott - 4 months ago 29
Java Question

Using the Stanford Dependency Parser on a previously tagged sentence

I'm currently using the Twitter POS tagger available here to tag out tweets into the Penn-Tree Bank tags.

Here is that code:

import java.util.List;

import cmu.arktweetnlp.Tagger;
import cmu.arktweetnlp.Tagger.TaggedToken;

/* Tags the tweet text */
List<TaggedToken> tagTweet(String text) throws IOException {

// Loads Penn Treebank POS tags

// Tags the tweet text
taggedTokens = tagger.tokenizeAndTag(text);

return taggedTokens;

Now I need to identify where the direct objects are in these tags. After some searching, I've discovered that the Stanford Parser can do this, by way of the Stanford Typed Dependencies, (online example). By using the dobj() call, I should be able to get what I need.

However, I have not found any good documentation about how to feed already-tagged sentences into this tool. From what I understand, before using the Dependency Parser I need to create a tree from the sentence's tokens/tags. How is this done? I have not been able to find any example code.

The Twitter POS Tagger contains an instance of the Stanford NLP Tools, so I'm not far off, however I am not familiar enough with the Stanford tools to feed my POS-tagged text into it in order to get the dependency parser to work properly. The FAQ does mention this functionality, but without any example code to go off of, I'm a bit stuck.


Here is how it is done with completely manual creation of the List discussed in the FAQ:

String[] sent3 = { "It", "can", "can", "it", "." };
// Parser gets second "can" wrong without help (parsing it as modal MD)
String[] tag3 = { "PRP", "MD", "VB", "PRP", "." };                                                 
List<TaggedWord> sentence3 = new ArrayList<TaggedWord>();
for (int i = 0; i < sent3.length; i++) {
  sentence3.add(new TaggedWord(sent3[i], tag3[i]));
Tree parse = lp.parse(sentence3);