soarjay soarjay - 1 month ago 12
Python Question

Issue creating LL(1) grammar

I'm learning how parsers work by creating a simple recursive descent parser. However I'm having a problem defining my grammar to be LL(1). I want to be able to parse the following two statements:

a = 1
a + 1


To do this I've created the following grammar rules:

statement: assignent | expression
assignment: NAME EQUALS expression
expression: term [(PLUS|MINUS) term]
term: NAME | NUMBER


However, this leads to ambiguity when using a LL(1) parser as when a NAME token is encountered in the
statement
rule, it doesn't know whether it is an
assignment
or an
expression
without a look-ahead.

Python's grammar is LL(1) so I know this is possible to do but I can't figure out how to do it. I've looked at Python's grammar rules found here (https://docs.python.org/3/reference/grammar.html) but I'm still not sure how they implement this.

Any help would be greatly appreciated :)

Answer

Just treat = as an operator with very low precedence. However (unless you want a language like C where = really is an operator with very low precedence), you need to exclude it from internal (e.g. parenthetic) expressions.

If you had only multiplication and addition, you could use:

expression: factor ['+' factor]
factor:     term ['*' term]
term:       ID | NUMBER | '(' expression ')'

That is a guide for operator precedence: has higher precedence because the arguments to + can include s but not vice versa. So we could just add assignment:

statement: expression ['=' expression]

Unfortunately, that would allow, for example:

(a + 1) = b

which is undesirable. So it needs to be eliminated, but it is possible to eliminate it when the production is accepted (by a check of the form of the first expression), rather than in the grammar itself. As I understand it, that's what the Python parser does; see the long comment about test and keywords.

If you used an LR(1) parser instead, you wouldn't have this problem.