Connor Connor - 2 months ago 8
C Question

What parts of this HelloWorld assembly code are essential if I were to write the program in assembly?

I have this short hello world program:

#include <stdio.h>

static const char* msg = "Hello world";

int main(){
printf("%s\n", msg);
return 0;
}


I compiled it into the following assembly code with gcc:

.file "hello_world.c"
.section .rodata
.LC0:
.string "Hello world"
.data
.align 4
.type msg, @object
.size msg, 4
msg:
.long .LC0
.text
.globl main
.type main, @function
main:
.LFB0:
.cfi_startproc
pushl %ebp
.cfi_def_cfa_offset 8
.cfi_offset 5, -8
movl %esp, %ebp
.cfi_def_cfa_register 5
andl $-16, %esp
subl $16, %esp
movl msg, %eax
movl %eax, (%esp)
call puts
movl $0, %eax
leave
.cfi_restore 5
.cfi_def_cfa 4, 4
ret
.cfi_endproc
.LFE0:
.size main, .-main
.ident "GCC: (Ubuntu 4.8.4-2ubuntu1~14.04.3) 4.8.4"
.section .note.GNU-stack,"",@progbits


My question is: are all parts of this code essential if I were to write this program in assembly (instead of writing it in C and then compiling to assembly)? I understand the assembly instructions but there are certain pieces I don't understand. For instance, I don't know what .cfi* is, and I'm wondering if I would need to include this to write this program in assembly.

Answer

The absolute bare minimum that will work on the platform that this appears to be, is

        .globl main
main:
        pushl   $.LC0
        call    puts
        addl    $4, %esp
        xorl    %eax, %eax
        ret
.LC0:
        .string "Hello world"

But this breaks a number of ABI requirements. The minimum for an ABI-compliant program is

        .globl  main
        .type   main, @function
main:
        subl    $24, %esp
        pushl   $.LC0
        call    puts
        xorl    %eax, %eax
        addl    $28, %esp
        ret
        .size main, .-main
        .section .rodata
.LC0:
        .string "Hello world"

Everything else in your object file is either the compiler not optimizing the code down as tightly as possible, or optional annotations to be written to the object file.

The .cfi_* directives, in particular, are optional annotations. They are necessary if and only if the function might be on the call stack when a C++ exception is thrown, but they are useful in any program from which you might want to extract a stack trace. If you are going to write nontrivial code by hand in assembly language, it will probably be worth learning how to write them. Unfortunately, they are very poorly documented; I am not currently finding anything that I think is worth linking to.

It is probably worth mentioning that in the "AT&T" assembly syntax used (by default) by GCC and GNU binutils, there are three kinds of lines: A line with a single token on it, ending in a colon, is a label. (I don't remember the rules for what characters can appear in labels.) A line whose first token begins with a dot, and does not end in a colon, is some kind of directive to the assembler. Anything else is an assembly instruction.