Purely as a self-learning exercise, I'm trying to write a Java parser in Perl using the
To decide if you have the right answer, you ideally have to compare to some kind of standard. This is hard for a computer languages.
Comparing ASTs is going to be hard, because there are no standards for such. Each parser that builds ASTs, builds an AST whose structure is designed by the person that coded the parser.
That means if you build an AST-producing parser, and you get somebody else's AST-producing parser, you'll discover that the AST nodes you have chosen don't match the other AST. Now you'll have to build a mapping from your AST to the other one (and how will you know the mapping is valid?). You can try to make your parser generate the AST from another parser, but what you will discover is the AST you produce is influenced by the parsing technology you use.
We have a similar problem with the Java front end my company produces (see bio if you want to know more). What we settle for is testing that the answer is self-consistent and then we do a lot of long-term experiential testing on big pieces of code.
Our solution is to:
It is tough to get this right; you have to get close and keep the testing pressure on continuously, especially since Java the language keeps moving. (We're at Java 8, and Java 9 is being threatened). Bottom line: it is a lot of work to build such a parser and check its sanity.
We'd love to have an independent set of tests, but we haven't seen one in the wild. And I would expect those tests if they exist (I assume Oracle and IBM have them) really don't test parsing and name resolution directly, but rather test that some bit of code compiles and runs producing a known result. Since we aren't building a compiler, we wouldn't be able to run such tests if we had them. We would be able to do the name resolution and type consistency checks and that would be helpful.
[We actually do this for a number of language front ends. You think Java is hard, try this with C++]