Dr.Kameleon Dr.Kameleon - 3 months ago 17
C++ Question

Choice of Parser Generator

OK, I understand this question may sound quite opinion-based, however since I have several specific criteria of choice, I think it would make a nice fit for SO. So, here I am...

I've worked with compiler/interpreter construction in the past quite a lot (mostly as a hobby obviously) and for some reason I stuck with Lex/Yacc (or Flex/Bison, I'm quite confused as to how they call them now... lol).

However, since I'm finding myself currently playing with yet another hobbyist interpreter project, I thought I should give a try to something different maybe in order to avoid what I don't like about Lex/Yacc.

So, namely :


  • Better C++-friendly (than C-friendly)

  • Good documentation (preferably with some existing grammars already implemented + instructions on how to compile/use them - sounds rather obvious, huh?)

  • Could be LALR, LL(*), Recursive descent, I don't really care (note: an input as to which type you would prefer and for what type of implementations would be great though; I've never really understood their pros and cons, to be perfectly honest, even though I do know what they refer to)

  • Combining the Lexer part and the Parser grammar in one file wouldn't be bad at all; never really got it why it has to be split in two.

  • Last but not least : I've always had issues with... issues. I mean - at least as far as Lex/Yacc go, parsing error messages are more-or-less cryptic (
    Syntax Error
    ... Yuhuu!) and rarely do they help diagnose an issue. (Well, unless you are the one who developed the interpreter... lol). So, is there anything better than Lex/Yacc regarding error reporting?



OK, I hope that wasn't too verbose. I'm all ears! :-)

Answer

I'm just going to answer the last question, with a slight edit:

at least as far as Lex/Yacc go, parsing error messages are more-or-less cryptic (Syntax Error... Yuhuu!) and rarely do they help diagnose an issue. (Well, unless you are the one who developed the interpreter... lol). So, is there anything better than a better way to use Lex/Yacc regarding error reporting?

Well, start by using a modern version of bison, which has a reasonably complete manual online (and quite possibly installed with the executable, depending on how you install bison). In particular, start with these declarations:

%define parse.error verbose
%define parse.lac full

That will at least replace the mysterious "syntax error" error with a list of "expected" token types.

Then make sure that your token types have meaningful names, because they will get presented to the user as part of the error message. If you're used to using IDENTIFIER as a terminal, then you're probably ok, but the message "Expected TOK_YY_ID" is a bit geeky. You can declare a readable for a terminal in the type declaration:

%type TOK_YY_ID "identifier"

That will only take you so far. In many cases, knowing what was "expected" is sufficient to understand a syntax error, but sometimes it's useful to be more explicit. In such cases, it's useful to actually define error rules. Getting these right is more an art than a science, but that's true of all approaches to error reporting/recovery; the key is to try to be as specific as possible about what the erroneous syntax looks like, and not more specific than necessary.

One interesting approach to error reporting is to use the current parser state and lookahead token (both of which are visible at the moment of error reporting) to lookup up a custom error message, if one exists. I think this approach has been a part of compiler folklore for a long time, and I'm sure I've seen several articles about it over the decades. Here is a relatively recent article by Russ Cox.