Ian D. Scott Ian D. Scott - 2 months ago 16
Brainfuck Question

Parsing brainf*ck code to tree in Rust

I am trying to write an optimizing brainfuck compiler in Rust. Currently it stores tokens in a flat vector, which works, but I am having trouble changing it to use a syntax tree:

#[derive(Clone, PartialEq, Eq)]
pub enum Token {
Output,
Input,
Loop(Vec<Token>),
Move(i32),
Add(i32, i32),
LoadOut(i32, i32),
}
use Token::*;

pub fn parse(code: &str) -> Vec<Token> {
let mut alltokens = Vec::new();
let mut tokens = &mut alltokens;
let mut tokvecs: Vec<&mut Vec<Token>> = Vec::new();
for i in code.chars() {
match i {
'+' => tokens.push(Add(0, 1)),
'-' => tokens.push(Add(0, -1)),
'>' => tokens.push(Move(1)),
'<' => tokens.push(Move(-1)),
'[' => {
tokens.push(Loop(Vec::new()));
tokvecs.push(&mut tokens);
if let &mut Loop(mut newtokens) = tokens.last_mut().unwrap() {
tokens = &mut newtokens;
}
},
']' => {
tokens = tokvecs.pop().unwrap();
},
',' => tokens.push(Input),
'.' => {
tokens.push(LoadOut(0, 0));
tokens.push(Output);
}
_ => (),
};
}

alltokens
}


What I am having trouble figuring out is how to handle the
[
command. The current implementation in the code is one of several I have tried, all of which have failed. I think it may require use of Rust's
Box
, but I can't quite understand how that is used.

The branch handling the
[
command is probably completely wrong, but I'm not sure how it should be done. It pushes a
Loop
(a variant of the
Token
enum) containing a vector to the
tokens
vector. The problem is to then get a mutable borrow of the vector in that
Loop
, which the
if let
statement is supposed to do.

The code fails to compile since
newtokens
does not outlive the end of the
if let
block. Is it possible to get a mutable reference to the vector inside
Loop
, and set
tokens
to it? If not, what could be done instead?

Answer

I've gotten the code to work by making it a recursive function:

#[derive(Clone, PartialEq, Eq)]
pub enum Token {
    Output,
    Input,
    Loop(Vec<Token>),
    Move(i32),
    Add(i32, i32),
    LoadOut(i32, i32),
}
use Token::*;

pub fn parse(code: &str) -> Vec<Token> {
    _parse(&mut code.chars())
}

fn _parse(chars: &mut std::str::Chars) -> Vec<Token> {
    let mut tokens = Vec::new();
    while let Some(i) = chars.next() {
        match i {
            '+' => tokens.push(Add(0, 1)),
            '-' => tokens.push(Add(0, -1)),
            '>' => tokens.push(Move(1)),
            '<' => tokens.push(Move(-1)),
            '[' => tokens.push(Loop(_parse(chars))),
            ']' => { break; }
            ',' => tokens.push(Input),
            '.' => {
                tokens.push(LoadOut(0, 0));
                tokens.push(Output);
            }
            _ => (),
        };
    }

    tokens
}

It seems to work, and is reasonably simple and elegant (I'd still be interested to see a solution that doesn't use recursion).

Comments