OK, I understand this question may sound quite opinion-based, however since I have several specific criteria of choice, I think it would make a nice fit for SO. So, here I am...
I've worked with compiler/interpreter construction in the past quite a lot (mostly as a hobby obviously) and for some reason I stuck with Lex/Yacc (or Flex/Bison, I'm quite confused as to how they call them now... lol).
However, since I'm finding myself currently playing with yet another hobbyist interpreter project, I thought I should give a try to something different maybe in order to avoid what I don't like about Lex/Yacc.
So, namely :
I'm just going to answer the last question, with a slight edit:
at least as far as Lex/Yacc go, parsing error messages are more-or-less cryptic (Syntax Error... Yuhuu!) and rarely do they help diagnose an issue. (Well, unless you are the one who developed the interpreter... lol). So, is there
anything better thana better way to use Lex/Yacc regarding error reporting?
Well, start by using a modern version of bison, which has a reasonably complete manual online (and quite possibly installed with the executable, depending on how you install bison). In particular, start with these declarations:
%define parse.error verbose %define parse.lac full
That will at least replace the mysterious "syntax error" error with a list of "expected" token types.
Then make sure that your token types have meaningful names, because they will get presented to the user as part of the error message. If you're used to using
IDENTIFIER as a terminal, then you're probably ok, but the message "Expected TOK_YY_ID" is a bit geeky. You can declare a readable for a terminal in the
%type TOK_YY_ID "identifier"
That will only take you so far. In many cases, knowing what was "expected" is sufficient to understand a syntax error, but sometimes it's useful to be more explicit. In such cases, it's useful to actually define
error rules. Getting these right is more an art than a science, but that's true of all approaches to error reporting/recovery; the key is to try to be as specific as possible about what the erroneous syntax looks like, and not more specific than necessary.
One interesting approach to error reporting is to use the current parser state and lookahead token (both of which are visible at the moment of error reporting) to lookup up a custom error message, if one exists. I think this approach has been a part of compiler folklore for a long time, and I'm sure I've seen several articles about it over the decades. Here is a relatively recent article by Russ Cox.