algui91 algui91 -4 years ago 106
HTML Question

Exclude some characters from a Lex regex

I am trying to build a regex for lex that match the bold text in mardown syntax. For example:

__strong text__
I thought this:

__[A-Za-z0-9_ ]+__


And then replace the text by

<strong>Matched text</strong>


But in Lex, this rule causes the variable
yytext
to be
__Matched Text__
. How could I get rid of the underscores? It would be better to create a regex that does not match the underscores or proccess the variable
yytext
to remove it?

With capturing groups it would be easer, because I would only need the regex:

__([A-z0-9 ]+)__


And use
\1
. But Lex does not support capturing groups.

Answer



I finally take the first option offer by João Neto, but a little modified:

yytext[strlen(yytext)-len]='\0'; // exclude last len characters
yytext+=len; // exclude first len characters


I've tried with
Start conditions
as he mentioned as second option, but did not work.

Answer Source

You can process yytext by removing the first and last two characters.

yytext[strlen(yytext)-2]='\0'; // exclude last two characters
yylval.str = &yytext[2]; // exclude first two characters

Another option is to use stack

%option stack
%x bold

%%

"__"         { yy_push_state(bold); yylval.str = new std::string(); }
<bold>"__"   { yy_pop_state(); return BOLD_TOKEN; }
<bold>.|\n   { yylval.str += yytext; }
Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download