Joel Joel - 1 month ago 18
C Question

BF compiler optimizations

I wrote simple brainfuck2c translator. I have added slight optimizations (for pointer moves), but I think about optimizing additions/subtractions, for example we have code like

++++
that would translate into
mem[0]+=1;mem[0]+=1;mem[0]+=1;mem[0]+=1;
. My goal is make my interpreter optimize additions and subtraction to output following code for this:
mem[0]+=4;
. Code i wrote (with only arrows optimization); I know it's not "c++ style" but i am C programmer. So how to implement addition/subtranction optimizations? I have googled somewhat, but i found only python solutions, and i didnt use them because i don't understand them, and solutions using external libraries (e.g. LLVM).

#include <string.h>
#include <stdio.h>
#include <stdlib.h>

int main(int argc, char** argv){
int pos = 0;
int amount = 0;
int brackets = 0;
if(argc!=2)exit(0);
FILE* f = fopen(argv[1], "r");
if(f!=NULL){
char* outbuffer = malloc(1024*1024*1024);
char c = fgetc(f);
strcpy(outbuffer, "int getch(void);\nvoid putch(int);\nint main(){\n");
while(!feof(f)){
if(c == '<'){
if(pos==0)pos=65536;
else pos--;
}
if(c == '>'){
if(pos==65536)pos=0;
else pos++;
}
if(c == '['){
brackets++;
sprintf(outbuffer, "while(mem[%d]){", pos);
}
if(c == ']'){
brackets--;
sprintf(outbuffer, "}");
}
if(c == '+'){
sprintf(outbuffer, "mem[%d]+=1;", pos);
}
if(c == '-'){
sprintf(outbuffer, "mem[%d]-=1;", pos);
}
if(c == '.'){
sprintf(outbuffer, "putch(mem[%d]);", pos);
}if(c == '.'){
sprintf(outbuffer, "mem[%d] = getch();", pos);
}
c = fgetc(f);
}
if(brackets == 1){
printf("Compilation succesfull. ");
printf("Generated Code:\n%s", outbuffer);
free(outbuffer);
}
else{
printf("Comilation fault. Unbalanced brackets.");
free(outbuffer);
}
}
}


Okay, i have found an answer. I created separate functions; It can be helpful when writing a compiler with C backend; Thanks to Gene for help.

#include <string.h>
#include <stdlib.h>
#include <stdio.h>

int minus(FILE* f){
int c;
int amount = 0;
while ((c = getc(f)) == '-')
amount++;
ungetc(c, f);
return amount;
}

int plus(FILE* f){
int c;
int amount = 0;
while ((c = getc(f)) == '+')
amount++;
ungetc(c, f);
return amount;
}

int main(int argc, char** argv){
int pos = 0;
int brackets = 0;
if(argc!=2)exit(0);
int amount = 1;
FILE* f = fopen(argv[1], "r");
if(f!=NULL){
char* outbuffer = (char*)malloc(1024*1024);
strcpy(outbuffer, "int getch(void);\nvoid putch(int);\nint main(){\n");
while(!feof(f)){
printf("Iterating.");
char c = fgetc(f);
if(c == '<'){
if(pos==0)pos=65536;
else pos--;
}
if(c == '>'){
if(pos==65536)pos=0;
else pos++;
}
if(c == '['){
brackets++;
sprintf(outbuffer, "%swhile(mem[%d]){",outbuffer,pos);
}
if(c == ']'){
brackets--;
sprintf(outbuffer, "%s}", outbuffer);
}
if(c == '+'){
amount = plus(f)+1;
sprintf(outbuffer, "%smem[%d]+=%d;",outbuffer, pos, amount);
}
if(c == '-'){
amount = minus(f)+1;
sprintf(outbuffer, "%smem[%d]-=%d;",outbuffer, pos, amount);
}
if(c == '.'){
sprintf(outbuffer, "%sputch(mem[%d]);",outbuffer, pos);
}
if(c == '.'){
sprintf(outbuffer, "%smem[%d]=getch();",outbuffer, pos);
}
}
if(brackets == 0){
printf("Compilation succesfull. ");
printf("Generated Code:\n%s", outbuffer);
free(outbuffer);
}
else{
printf("Comilation fault. Unbalanced brackets.");
free(outbuffer);
}
}
}

Answer

The general approach that compilers use for these kinds of simple optimizations is to generate code lazily.

Rather than emit code for a given HLL feature immediately, it's stored in a convenient data structure that serves as a buffer. After each new chunk of code is added, the optimizer inspects the buffer and applies pattern matching rewrite rules that reduce the cost of the buffered code without changing its meaning.

In your case, the rule would look like X += m; X += n; --> X += (m+n);. I.e., rewrite two increments of the same l-value as a single increment by the sum of the original two.

Another trivial example: X += 0 --> <nothing>. I.e., erase increments by zero.

Code is kept in the buffer as long as it might yet match one of the rules. When that's impossible, it's finally emitted and deleted.

The buffer is sometimes called a "peephole" into the instruction output stream. Therefore this method of improving code is called "peephole optimization". You can learn a lot by searching for this term.

There are lots of schemes for implementing the buffer. For expressions, a symbolic operand stack can work well. For statements, the usual approach is to store a FIFO queue of 2- or 3- address code, an abstract representation of output code.

Comments