Alana Oliveira Alana Oliveira - 5 months ago 33
Python Question

pyparsing.ParseException when using parseString (searchString works)

I'm trying to parse some Traffic Violation sentences using pyparsing, when I use

it is ok, but when I use
is thrown. Can anybody help me please saying what is wrong with my code?

from pyparsing import Or, Literal, oneOf, OneOrMore, nums, alphas, Regex, Word, \
SkipTo, LineEnd, originalTextFor, Optional, ZeroOrMore, Keyword, Group
import pyparsing as pp

from nltk.tag import pos_tag

sentences = ['Failure to control vehicle speed on highway to avoid collision','Failure to stop at stop sign', 'Introducing additives into special fuel by unauthorized person and contrary to regulations', 'driver fail to stop at yield sign at nearest pointf approaching traffic view when req. for safety', 'Operating unregistered motor vehicle on highway', 'Exceeding maximum speed: 39 MPH in a posted 30 MPH zone']

for sentence in sentences:
words = pos_tag(sentence.split())
#print words
verbs = [word for word, pos in words if pos in ['VB','VBD','VBG']]
nouns = [word for word, pos in words if pos == 'NN']
adjectives = [word for word, pos in words if pos == 'JJ']

adjectives.append('great') # initializing
verbs.append('get') # initializing

object_generator = oneOf('for to')
location_generator = oneOf('at in into on onto over within')
speed_generator = oneOf('MPH KM/H')

noun = oneOf(nouns)
adjective = oneOf(adjectives)

location = location_generator + pp.Group(Optional(adjective) + noun)

action = oneOf(verbs)
speed = Word(nums) + speed_generator

grammar = action | location | speed

parsed = grammar.parseString(sentence)

print parsed

Error traceback

Traceback (most recent call last): File "", line 35, in parsed = grammar.parseString(sentence) File "/Users/alana/anaconda/lib/python2.7/site-packages/pyparsing‌​.py", line 1032, in parseString raise exc pyparsing.ParseException: Expected Re:('control|avoid|get') (at char 0), (line:1, col:1)


searchString is working because it skips over text that doesn't exactly match the grammar. parseString is much more particular, requiring a complete grammar match, beginning right with the first character of the input string. In your case, the grammar is a little difficult to determine, since it is auto-generated based on the NLTK analysis of the input sentence (an interesting approach, btw). If you just print the grammar itself, it may give you some insights into what strings it is looking for. For instance, I'm guessing NLTK will interpret 'Failure' in your first example as a noun, yet none of your 3 expressions in your grammar starts with a noun - therefore, parseString will fail.

You'll probably need to do a lot more internal printing of noun, adjective, and verb lists based on what NLTK finds, and then see how that maps to your generated grammar.

You can also try to combine the results of multiple matches in the sentence using Python's sum() builtin:

grammar =  action("action") | Group(location)("location") | Group(speed)("speed")

#parsed = grammar.parseString(sentence)
parsed = sum(grammar.searchString(sentence))