monobogdan monobogdan - 5 months ago 75
Java Question

ANTLR 4: Parsing grammar

I want to parse some data from AppleSoft Basic script.
I choose ANTLR and download this grammar: jvmBasic

I'm trying to extract function name without parameters:

return parser.prog().line(0).amprstmt(0).statement().getText();


but it returns PRINT"HELLO" e.g full expression except the line number
Here is string i want to parse:


10 PRINT "Hello!"

Answer

I think this question really depends on your ANTLR program implementation but if you are using a treewalker/listener you probably want to be targeting the rule for the specific tokens not the entire "statement" rule which is circular and encompasses many types of statement :

//each line can have one to many amprstmt's
line
   : (linenumber ((amprstmt (COLON amprstmt?)*) | (COMMENT | REM)))
   ;

amprstmt
   : (amperoper? statement) //encounters a statement here
   | (COMMENT | REM)
   ;
//statements can be made of 1 to many sub statements
statement
   : (CLS | LOAD | SAVE | TRACE | NOTRACE | FLASH | INVERSE | GR | NORMAL | SHLOAD | CLEAR | RUN | STOP | TEXT | HOME | HGR | HGR2)
   | prstmt
   | printstmt1 //the print rule
   //MANY MANY OTHER RULES HERE TOO LONG TO PASTE........
   ;
//the example rule that occurs when the token's "print" is encountered
printstmt1
   : (PRINT | QUESTION) printlist?
   ;

printlist
   : expression (COMMA | SEMICOLON)? printlist*
   ;

As you can see from the BNF type grammar here the statement rule in this grammar includes the rules for a print statement as well as every other type of statement so it will encompass 10, PRINT and hello and subsequently return the text with the getText() method when any of these are encountered in your case, everything but linenumber which is a rule outside of the statement rule.

If you want to target these specific rules to handle what happens when they are encountered you most likely want to add functionality to each of the methods ANTLR generates for each rule by extending the jvmBasiListener class as shown here

example:

-jvmBasicListener.java
-extended to jvmBasicCustomListener.java

void enterPrintstmt1(jvmBasicParser.Printstmt1Context ctx){
System.out.println(ctx.getText());
}

However if all this is setup and you are just wanting to return a string value etc using the single line you have then trying to access the methods at a lower level by addressing the child nodes of statement may work amprstmt->statement->printstmt1->value :

 return  parser.prog().line().amprstmt(0).statement().printstmt1().getText();

Just to maybe narrow my answer slightly, the rules specifically that address your input "10 PRINT "HELLO" " would be :

linenumber (contains Number) , statement->printstmt1 and statement->datastmt->datum (contains STRINGLITERAL)

So as shown above the linenumber rule exists on its own and the other 2 rules that defined your text are children of statement, which explains outputting everything except the line number when getting the statement rules text.

Addressing each of these and using getText() rather than an encompassing rule such as statement may give you the result you are looking for.


I will update to address your question since the answer may be slightly longer, the easiest way in my opinion to handle specific rules rather than generating a listener or visitor would be to implement actions within your grammar file rules like this :

printstmt1
   : (PRINT | QUESTION) printlist? {System.out.println("Print"); //your java code }
   ;

This would simply allow you to address each rule and perform whichever java action you would wish to carry out. You can then simply compile your code with something like :

java -jar antlr-4.5.3-complete.jar jvmBasic.g4 -visitor

After this you can simply run your code however you wish, here is an example:

import JVM1.jvmBasicLexer;
import org.antlr.v4.runtime.ANTLRInputStream;
import org.antlr.v4.runtime.CommonTokenStream;
import org.antlr.v4.runtime.tree.ParseTree;


public class Jvm extends jvmBasicBaseVisitor<Object> {


    public static void main(String[] args) {
        jvmBasicLexer lexer = new jvmBasicLexer(new ANTLRInputStream("10 PRINT \"Hello!\""));
        jvmBasicParser parser = new jvmBasicParser(new CommonTokenStream(lexer));
        ParseTree tree = parser.prog();
    }

}

The output for this example would then be just :

Print

You could also incorporate whatever Java methods you like within the grammar to address each rule encountered and either develop your own classes and methods to handle it or directly print it out a result.

:) Good luck