James Ko James Ko - 6 months ago 39
C# Question

ANTLR: How to avoid re-parsing entire file when user modifies text

edit: For those interested/who want to see exactly what I'm doing, the source code of my app can be found here.

I'm building a code editor app with C# that offers syntax highlighting. I'm currently using ANTLR for C# to parse the code in order to highlight it. So far, my app can highlight the code really fast when the user initially opens the file. However, I haven't written any code to re-highlight the text when the user starts editing it.

I want the editor to perform well for large files, so I don't want to re-parse the entire file each time the user types a character. I did a bit of research, and it seems like what I'm looking for is an incremental parser. Unfortunately, it seems like ANTLR v4 can't do incremental parsing, so I'm unsure what to do.

My question is: is there another approach I can take, using ANTLR, to not freeze the app whenever the user types? I'm really hesitant to give up on ANTLR since there are a bunch of free grammars available for it, so it's not much work to add support for a new language. I've looked into TextMate grammars, VSCode uses lots of them, but I don't understand them and there are no C# libraries available to manipulate them.

Thanks for helping!

Answer Source

I don't parse after every keystroke, but I do parse the entire file. This works great for intermediate-size files in the domain-specific languages I've created. Instead of trying parse only parts of the file, I use a mixed approach, parsing when the first of either of three conditions exists:

  1. User types n characters
  2. A timer has said that there's no change in m milliseconds.
  3. For some grammars, user types line terminator/separator character;

Bottom line is, you might be surprised at how much time people spend pausing and thinking as they're typing in anything that imposes a grammar on them. These pauses can be exploited to do useful work while the user thinks, even for 400 milliseconds. I use #1 and #2 in the DSLs I've created for work due to their syntax.

The "no change" clock gets reset after every keystroke event and the n characters counter of course gets set when parsing occurs after n characters. I've found that a combination approach like this works well in an IDE type environment.

One thing to remember is, if you do this, don't mess with the text control's insertion point upon finding a syntax error, because errors are inescapable as they type. I simply show a message in a label:

    public override void Recover(Parser recognizer, RecognitionException e)
        IToken token = recognizer.CurrentToken;
        string message = string.Format("parse error at line {0}, position {1} right before {2} ", token.Line, token.Column, GetTokenErrorDisplay(token));
        BasicEnvironment.SyntaxError = message;

In my use environment, the timer usually governs when it goes off; with a value of 800 milliseconds and 10 characters I get great results, with the timer usually governing when the parse kicks off.

Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download