questions questions - 1 month ago 6x
C Question

Finding roots for garbage collection in C

I'm trying to implement a simple mark and sweep garbage collector in C. The first step of the algorithm is finding the roots. So my question is how can I find the roots in a C program?

In the programs using malloc, I'll be using the custom allocator. This custom allocator is all that will be called from the C program, and may be a custom init().

How does garbage collector knows what all the pointers(roots) are in the program? Also, given a pointer of a custom type how does it get all pointers inside that?

For example, if there's a pointer p pointing to a class list, which has another pointer inside it.. say q. How does garbage collector knows about it, so that it can mark it?

Update: How about if I send all the pointer names and types to GC when I init it? Similarly, the structure of different types can also be sent so that GC can traverse the tree. Is this even a sane idea or am I just going crazy?


First off, garbage collectors in C, without extensive compiler and OS support, have to be conservative, because you cannot distinguish between a legitimate pointer and an integer that happens to have a value that looks like a pointer. And even conservative garbage collectors are hard to implement. Like, really hard. And often, you will need to constrain the language in order to get something acceptable: for instance, it might be impossible to correctly collect memory if pointers are hidden or obfuscated. If you allocate 100 bytes and only keep a pointer to the tenth byte of the allocation, your GC is unlikely to figure out that you still need the block since it will see no reference to the beginning. Another very important constraint to control is the memory alignment: if pointers can be on unaligned memory, your collector can be slowed down by a factor of 10x or worse.

This brings us to a very important point: your pointers need to be as little confusable with normal integers as possible. I have very few tips for that, but I'd try having the most significant bit set. There's no actual science behind it, but my gut feeling is that most programs work with more positive values. This could help you have fewer false positives, although you will likely need operating system support to specify preferred allocation locations.

To find roots, you need to know where your stacks start, and where your stacks end. Notice the plural form: each thread has its own stack, and you might need to account for that, depending on your objectives. To know where a stack starts, without entering into platform-specific details (that I probably wouldn't be able to provide anyways), you can use assembly code inside the main function of the current thread (just main in a non-threaded executable) to query the stack register (esp on x86, rsp on x86_64 to name those two only). Gcc and clang support a language extension that lets you assign a variable permanently to a register, which should make it easy for you:

register void* stack asm("esp"); // replace esp with the name of your stack reg

(register is a standard language keyword that is most of the time ignored by today's compilers, but coupled with asm("register_name"), it lets you do some nasty stuff.)

To ensure you don't forget important roots, you should defer the actual work of the main function to another one. (On x86 platforms, you can also query ebp/rbp, the stack frame base pointers, instead, and still do your actual work in the main function.)

int main(int argc, const char** argv, const char** envp)
    register void* stack asm("esp");
    // put stack somewhere
    return do_main(argc, argv, envp);

Once you enter your GC to do collection, you need to query the current stack pointer for the thread you've interrupted. You will need design-specific and/or platform-specific calls for that (though if you get something to execute on the same thread, the technique above will still work).

The actual hunt for roots starts now. Good news: most ABIs will require stack frames to be aligned on a boundary greater than the size of a pointer, which means that if you trust every pointer to be on aligned memory, you can treat your whole stack as a intptr_t* and check if any pattern inside looks like any of your managed pointers.

Obviously, there are other roots. Global variables can (theoretically) be roots, and fields inside structures can be roots too. You need to separately account for global variables that can be roots (or forbid that altogether, which isn't a bad idea in my opinion) because automatic discovery of those would be hard (at least, I wouldn't know how to do it on any platform).

These roots can lead to references on the heap, where things can go awry if you don't take care.

Since not all platforms provide malloc introspection (as far as I know), you need to implement the concept of scanned memory--that is, memory that your GC knows about. It needs to know at least the address and the size of each of such allocation. When you get a reference to one of these, you simply scan them for pointers, just like you did for the stack. (This means that you should take care that your pointers are aligned. This is normally the case if you let your compiler do its job, but you still need to be careful when you use third-party APIs).

This also means that you cannot put references to collectable memory to places where the GC can't reach it. And this is where it hurts the most and where you need to be extra-careful. Otherwise, if your platform supports malloc introspection, you can easily tell the size of each allocation you get a pointer to and make sure you don't overrun them.

This just scratches the surface of the topic. Garbage collectors are extremely complex, even when single-threaded. When you add threads to the mix, you enter a whole new world of hurt.

Apple has implemented such a conservative GC for the Objective-C language and dubbed it libauto. They have open-sourced it, along with a good part of the low-level technologies of Mac OS X, and you can find the source here.

I can only quote Hot Licks here: good luck!

Okay, before I go even further, I forgot something very important: compiler optimizations can break the GC. If your compiler is not aware of your GC, it can very well never put certain roots on the stack (only dealing with them in registers), and you're going to miss them. This is not too problematic for single-threaded programs if you can inspect registers, but again, a huge mess for multithreaded programs.

Also be very careful about the interruptibility of allocations: you must make sure that your GC cannot kick in while you're returning a new pointer because it could collect it right before it is assigned to a root, and when your program resumes it would assign that new dangling pointer to your program.

And here's an update to address the edit:

Update: How about if I send all the pointer names and types to GC when I init it? Similarly, the structure of different types can also be sent so that GC can traverse the tree. Is this even a sane idea or am I just going crazy?

I guess you could allocate our memory then register it with the GC to tell it that it should be a managed resource. That would solve the interruptability problem. But then, be careful about what you send to third-party libraries, because if they keep a reference to it, your GC might not be able to detect it since they won't register their data structures with your GC.

And you likely won't be able to do that with roots on the stack.