William Jockusch William Jockusch - 9 months ago 49
C# Question

Antlr parser StackOverflowException

I have the following in my Antlr parser grammar for my Xamarin.ios C# project:

: DIGIT #Digit
| NULL #Null
| LESSTHAN #LessThan
| GREATERTHAN #GreaterThan
| anyLessThanOrEqual #LessThanOrEqual
// about 30 more options here

: mathToken mathTokenList #CompoundMathTokens
| mathToken #SingleMathToken

This works great for a list of 10 tokens, 100, or even 1000. But once the list gets long enough, it leads to a StackOverflowException, as the generated MathTokenList recursively calls itself, with some listener code at the top:

MyNamespace.HandleToken(MyTokenClass parserToken, List<MyOtherTokenClass> buildingList) in MyNamespace.ManualFileParser.cs:58
MyNamespace.CustomStringReaderParseListener.VisitDefault(Antlr4.Runtime.Tree.TerminalNodeImpl node) in MyNamespace.CustomStringReaderParseListener.cs:228
MyNamespace.CustomStringReaderParseListener.VisitTerminal(Antlr4.Runtime.Tree.TerminalNodeImpl node) in MyNamespace.CustomStringReaderParseListener.cs:94
Antlr4.Runtime.Parser.Match(int ttype)
Antlr.StringReaderParser.mathTokenList() // lots of calls here . . . seems to be
Antlr.StringReaderParser.mathTokenList() // one for each token.
Antlr.StringReaderParser.mathTokenList() // ( in antlr generated code)

Is it possible to restructure the parser grammar to avoid this kind of problem? Or do I need to do something more involved so that the parser never sees a really long list of mathTokens?

I could put a band-aid on the problem by combining lists of digits before the parser sees them. But it would likely recur eventually with some other token type.

Answer Source

You cannot entirely avoid this problem. Each rule invocation is actually a function call in the generated parser (recursive descent parser principle). If your input is only complex enough you will certainly hit the available stack limit. In most (all?) compilers you can increase the stackspace for your app, but this also helps only up to a certain extent.

However, as @BartKiers suggested you can lower the invocation count by using loops instead of recursions (a principle often used in programming). The mathTokenList rule looks very much like how you would define a list in yacc/bison. In ANTLR you can use loop operators instead, which make this better to read and less resource intensive:

mathTokenList: mathToken+;

is the way to go here. This will end up in a loop being executed in your mathTokenList method (look at the generated parser code, sometimes quite enlightening).