luisforque luisforque - 9 months ago 48
C Question

Access identifier content when using a custom type in Bison

I have scanner and parser ready, using flex and bison.

The parser is building a tree directly in the actions, and to do so I created a struct called STreeNode and I am using

typedef STreeNode* YYSTYPE;

The struct is:

typedef struct tagSTreeNode
EOperationType type;
int count;
struct tagSTreeNode **children;
char *string;
} STreeNode;

There are like 40 tokens, and for every rule I have something like

assignment {$$ = createNode(eUNLABELED_STATEMENT, 1, $1);}
| function_call_statement {$$ = createNode(eUNLABELED_STATEMENT, 1, $1);}
| goto {$$ = createNode(eUNLABELED_STATEMENT, 1, $1);}
| return {$$ = createNode(eUNLABELED_STATEMENT, 1, $1);}
| conditional {$$ = createNode(eUNLABELED_STATEMENT, 1, $1);}
| repetitive {$$ = createNode(eUNLABELED_STATEMENT, 1, $1);}
| empty_statement {$$ = createNode(eUNLABELED_STATEMENT, 1, $1);}

The signature for the createNode function is

STreeNode *createNode(EOperationType type, int count, ...) {

The tree is working fine. The problem is accessing the real value for variable names, function names, etc. Since YYSTYPE is a struct, $x does not have the string value I want to save on the char * string element in the struct.

I have a %token called IDENTIFIER and another called INTEGER, and those should receive the values I want.

Researching, I discovered that I could try and use a union { } to have every token of a specific type. Maybe that could help? And if so, I would necessarily need to specify the type every single token? How can that be implemented?

What about yytext? Couldn't that be used to achieve this goal?

Thank you!

--- EDIT --

So I've created

%union {
char *string;
STreeNode *node;

and specified every terminal and non terminal type to be one of those. The nodes are still working, but the strings using ($1 for example) are returning null.

Do I need to change anything in the scanner as well? My scanner has:

[a-zA-Z][a-z0-9A-Z]* { return IDENTIFIER; }
[0-9]+ { return INTEGER; }

Thanks again.

Answer Source

If your tokens have a type set for them, the lexer needs to set yylval to the type in question. Something like:

[a-zA-Z][a-z0-9A-Z]*        { yylval.string = strdup(yytext); return IDENTIFIER; }
[0-9]+                      { yylval.string = strdup(yytext); return INTEGER; }