Dmitry Dmitry - 1 year ago 51
C Question

Is it practical to create a C language addon for anonymous functions?

I know that C compilers are capable of taking standalone code, and generate standalone shellcode out of it for the specific system they are targetting.

For example, given the following in


int give3() {
return 3;

I can run

gcc anon.c -o anon.obj -c
objdump -D anon.obj

which gives me (on MinGW):

anon1.obj: file format pe-i386

Disassembly of section .text:

00000000 <_give3>:
0: 55 push %ebp
1: 89 e5 mov %esp,%ebp
3: b8 03 00 00 00 mov $0x3,%eax
8: 5d pop %ebp
9: c3 ret
a: 90 nop
b: 90 nop

So I can make main like this:


#include <stdio.h>
#include <stdint.h>

int main(int argc, char **argv)
uint8_t shellcode[] = {
0x89, 0xe5,
0xb8, 0x03, 0x00, 0x00, 0x00,
0x5d, 0xc3,

int (*p_give3)() = (int (*)())shellcode;
printf("%d.\n", (*p_give3)());

My question is, is it practical to automate the process of converting the self contained anonymous function that does not refer to anything that is not within its scope or in arguments?


#include <stdio.h>
#include <stdint.h>

int main(int argc, char **argv)
uint8_t shellcode[] = [@[
int anonymous() {
return 3;

int (*p_give3)() = (int (*)())shellcode;
printf("%d.\n", (*p_give3)());

Which would compile the text into shellcode, and place it into the buffer?

The reason I ask is because I really like writing C, but making pthreads, callbacks is incredibly painful; and as soon as you go one step above C to get the notion of "lambdas", you lose your language's ABI(eg, C++ has lambda, but everything you do in C++ is suddenly implementation dependent), and "Lisplike" scripting addons(eg plug in Lisp, Perl, JavaScript/V8, any other runtime that already knows how to generalize callbacks) make callbacks very easy, but also much more expensive than tossing shellcode around.

If this is practical, then it is possible to put functions which are only called once into the body of the function calling it, thus reducing global scope pollution. It also means that you do not need to generate the shellcode manually for each system you are targetting, since each system's C compiler already knows how to turn self contained C into assembly, so why should you do it for it, and ruin readability of your own code with a bunch of binary blobs.

So the question is: is this practical(for functions which are perfectly self contained, eg even if they want to call puts, puts has to be given as an argument or inside a hash table/struct in an argument)? Or is there some issue preventing this from being practical?

Answer Source

I know that C compilers are capable of taking standalone code, and generate standalone shellcode out of it for the specific system they are targeting.

Turning source into machine code is what compilation is. Shellcode is machine code with specific constraints, none of which apply to this use-case. You just want ordinary machine code like compilers generate when they compile functions normally.

AFAICT, what you want is exactly what you get from static foo(int x){ ...; }, and then passing foo as a function pointer. i.e. a block of machine code with a label attached, in the code section of your executable.

Jumping through hoops to get compiler-generated machine code into an array is not even close to worth the portability downsides (esp. in terms of making sure the array is in executable memory).

It seems the only thing you're trying to avoid is having a separately-defined function with its own name. That's an incredibly small benefit that doesn't come close to justifying doing anything like you're suggesting in the question. AFAIK, there's no good way to achieve it in ISO C11, but:

Some compilers support nested functions as a GNU extension:

This compiles (with gcc6.2). On Godbolt, I used -xc to compile it as C, not C++.. It also compiles with ICC17, but not clang3.9.

#include <stdlib.h>

void sort_integers(int *arr, size_t len)
  int bar(){return 3;}  // gcc warning: ISO C forbids nested functions [-Wpedantic]

  int cmp(const void *va, const void *vb) {
    const int *a=va, *b=vb;       // taking const int* args directly gives a warning, which we could silence with a cast
    return *a > *b;

  qsort(arr, len, sizeof(int), cmp);

The asm output is:

    mov     eax, DWORD PTR [rsi]
    cmp     DWORD PTR [rdi], eax
    setg    al
    movzx   eax, al
    mov     ecx, OFFSET FLAT:cmp.2286
    mov     edx, 4
    jmp     qsort

Notice that no definition for bar() was emitted, because it's unused.

BTW, nested functions can even access variable in their parent (like lambas). Changing cmp into a function that does return len results in this highly surprising asm:

void call_callback(int (*cb)()) {

void foo(int *arr, size_t len) {
  int access_parent() { return len; }

## gcc5.4
    mov     rax, QWORD PTR [r10]
    xor     eax, eax
    jmp     rdi
    sub     rsp, 40
    mov     eax, -17599
    mov     edx, -17847
    lea     rdi, [rsp+8]
    mov     WORD PTR [rsp+8], ax
    mov     eax, OFFSET FLAT:access_parent.2450
    mov     QWORD PTR [rsp], rsi
    mov     QWORD PTR [rdi+8], rsp
    mov     DWORD PTR [rdi+2], eax
    mov     WORD PTR [rdi+6], dx
    mov     DWORD PTR [rdi+16], -1864106167
    call    call_callback
    add     rsp, 40

The x86-64 SysV ABI says that r10 is the "static chain pointer", for languages that use it. But this code ends up reading r10 without ever having written it. (It's from gcc5.4 -O3, but icc17 does the same thing if call_callback is declared with noinline, making access_parent read r10 without ever setting up r10 to point at the outer function's args on the stack. But when it does inline call_callback, it looks right.)

So anyway, actually making use of the nested nature and accessing outer scopes might tickle compiler bugs. It should be safe if you don't take advantage of that, but then there's not much point to this vs. a separate static function.