Tim Hopper Tim Hopper - 1 month ago 24
Python Question

Extracting Prepositional Phrases from Sentence

I'm trying to extract prepositional phrases from sentences using NLTK. Is there a way for me to do this automatically (e.g. feed a function a sentence and get back its prepositional phrases)?

The examples here seem to require that you start with a grammar before you can get a parse tree. Can I automatically get the grammar and use that to get the parse tree?

Obviously I could tag a sentence, pick out prepositions and the subsequent noun, but this is complicated when the prepositional complement is compound.

Answer

What you really is want is to fully parse your sentence with a robust statistical parser (e.g. like Stanford), and then look for constituents marked with PP:

(ROOT
  (S
    (NP (NNP John))
    (VP (VBZ lives)
      (PP (IN in)
        (NP (DT a) (NN house)))
      (PP (IN by)
        (NP (DT the) (NN sea))))))

I am not sure about the parsing abilities of NLTK and how accurate is the parsing if this feature exists, but it's not much of a problem to call an external parser from Python and then process the output. Using a parser will save you much time and effort (since the parser takes care of everything), and is the only reliable way to do this job.