Tex Tex - 1 month ago 6
Java Question

Get certain nodes out of a Parse Tree

I am working on a project involving anaphora resolution via Hobbs algorithm. I have parsed my text using the Stanford parser, and now I would like to manipulate the nodes in order to implement my algorithm.

At the moment, I don't understand how to:


  • Access a node based on its POS tag (e.g. I need to start with a pronoun - how do I get all pronouns?).

  • Use visitors. I'm a bit of a noob of Java, but in C++ I needed to implement a Visitor functor and then work on its hooks. I could not find much for the Stanford Parser's Tree structure though. Is that jgrapht? If it is, could you provide me with some pointers at code snippets?


Answer

@dhg's answer works fine, but here are two other options that it might also be useful to know about:

  • The Tree class implements Iterable. You can iterate through all the nodes of a Tree, or, strictly, the subtrees headed by each node, in a pre-order traversal, with:

    for (Tree subtree : t) { 
        if (subtree.label().value().equals("PRP")) {
            pronouns.add(subtree);
        }
    }
    
  • You can also get just nodes that satisfy some (potentially quite complex pattern) by using tregex, which behaves rather like java.util.regex by allowing pattern matches over trees. You would have something like:

    TregexPattern tgrepPattern = TregexPattern.compile("PRP");
    TregexMatcher m = tgrepPattern.matcher(t);
    while (m.find()) {
        Tree subtree = m.getMatch();
        pronouns.add(subtree);
    }