Sango Dragon Sango Dragon - 2 months ago 7
Java Question

Parsing in Java

I have a few, theoretical ideas, but I don't know the language well. We basically need to make a rudimentary lexical analyzer. I have most of the bits, but here's the context. The straight-forward question is at the end.

I need to read in a line from the console, then see if parts match up with a symbol table, which includes the keywords:

"print [variable]"
"load [variable]"
"mem [variable]"
, as well as mathematical symbols.

It also needs to recognise the variables on their own (such as
"c = a + b"
as well.)'s not that hard, in theory. You'd check the first character of the string matched up with keywords or variables. If they do, keep looping through that keyword or variable to check if it's the same string, up until you hit a space.

To summarize: How do I check the characters of a read in string to compare to stuff in Java?


I recommend to use Regex for text pattern matching. You receive the text via console as argument, you do so by using the args array of the main-method. Here's a small example:

public final class Parser {
    public static void main(final String[] args) {
        if (args.length < 1) {
            // Error, no input given

        String input = args[0];

        Pattern pattern = Pattern.compile("YOUR REGEX HERE");
        Matcher matcher = pattern.matcher(input);

        if (matcher.find()) {
            // Input matches the Regex pattern
            // Access to capturing groups using
            // Example: System.out.println(;

For Regex you can find various explanations on the web and on SO. You can try out your patterns at regex101.

Here's an example pattern which matches "name = name + name":

(.+) = (.+) \+ (.+)

The () creates capturing groups. Using for x from 1 to 3 you can access the matched values inside the brackets, i.e. the variables.
Here's the same example online with test input:

Fairly easy. However you may need to make the pattern more robust. It may not accept whitespace characters or special characters (for example a +) inside a variable name and so on.