Htbaa Htbaa - 1 year ago 73
Perl Question

HOP::Lexer with overlapping tokens

I'm using HOP::Lexer to scan BlitzMax module source code to fetch some data from it. One particular piece of data I'm currently interested in is a module description.

Currently I'm searching for a description in the format of

ModuleInfo "Description: foobar"
ModuleInfo "Desc: foobar"
. This works fine. But sadly, most modules I scan have their description defined elsewhere, inside a comment block. Which is actually the common way to do it in BlitzMax, as the documentation generator expects it.

This is how all modules have their description defined in the main source file.

bbdoc: my module description
End Rem
Module namespace.modulename

This also isn't really a problem. But the line after the End Rem also contains data I want (the module name). This is a problem, since now 2 definitions of tokens overlap each other and after the first one has been detected it will continue from where it left off (position of content that's being scanned). Meaning that the token for the module name won't detect anything.

Yes, I've made sure my order of tokens is correct. It just doesn't seem possible (somewhat understandable) to move the cursor back a line.

A small piece of code for fetching the description from within a Rem-End Rem block which is above a module definition (not worked out, but working for the current test case):

qr/[ \t]*\bRem\n(?:\n|.)*?\s*\bEnd[ \t]*Rem\nModule[\s\t]+/i,
sub {
my ($label, $value) = @_;
$value =~ /bbdoc: (.+)/;
[$label, $1];

So in my test case I first scan for a single comment, then the block above (MODULEDESCRIPTION), then a block comment (Rem-End Rem), module name, etc.

Currently the only solution I can think of is setup a second lexer only for the module description, though I wouldn't prefer that. Is what I want even possible at all with HOP::Lexer?

Source of my Lexer can be found at

Answer Source

I've solved it by adding (a slightly modified version of) the MODULEDESCRIPTION. Inside the subroutine I simply filter out the module name and return an arrayref with 4 elements, which I later on iterate over to create a nice usable array with tokens and their values.

Solution is again at

Edit: Or let me just paste the piece of code here


            qr/[ \t]*\bRem\R(?:\R|.)*?\bEnd[ \t]*Rem\R\bModule[\s\t]\w+\.\w+/i,
            sub {
                my ($label, $value) = @_;
                my ($desc) = ($value =~ /\bbbdoc: (.+)/i);
                my ($name) = ($value =~ /\bModule (\w+\.\w+)/i);
                [$label, $desc, 'MODULENAME', $name];
Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download