Grzegorz Szpetkowski Grzegorz Szpetkowski - 9 months ago 55
Linux Question

Responsiblity of stack alignment in x86 assembly

I am trying to get a clear picture of who (caller or callee) is reponsible of stack alignment. The case for 64-bit assembly is rather clear, that it is by caller.

Referring to System V AMD64 ABI, section 3.2.2 The Stack Frame:

The end of the input argument area shall be aligned on a 16 (32, if
__m256 is passed on stack) byte boundary.

In other words, it should be safe to assume, that for every entry point of called function:

16 | (%rsp + 8)

holds (extra eight is because
implicitely pushes return address on stack).

How it looks in 32-bit world (assuming cdecl)? I noticed that
places the alignment inside the called function with following construct:

and esp, -16

which seems to indicate, that is callee's responsibility.

To put it clearer, consider following code:

global main
extern printf
extern scanf
section .rodata
s_fmt db "%d %d", 0
s_res db `%d with remainder %d\n`, 0
section .text
start 0, 0
sub esp, 8
mov DWORD [ebp-4], 0 ; dividend
mov DWORD [ebp-8], 0 ; divisor

lea eax, [ebp-8]
push eax
lea eax, [ebp-4]
push eax
push s_fmt
call scanf
add esp, 12

mov eax, [ebp-4]
idiv DWORD [ebp-8]

push edx
push eax
push s_res
call printf

xor eax, eax

Is it required to align the stack before
is called? If so, then this would require to decrease
by four bytes before pushing these two arguments to

4 bytes (return address)
4 bytes (%ebp of previous stack frame)
8 bytes (for two variables)
12 bytes (three arguments for scanf)
= 28

Answer Source

gcc is just taking a defensive approach with -m32, by not assuming that main is called with a properly-aligned stack.

The i386 System V ABI has guaranteed for years that new 32-bit processes start with ESP aligned on a 16B boundary (e.g. at _start, the ELF entry point), and the glibc CRT code maintains that alignment.

Perhaps an extremely ancient Linux kernel (from before that revision to the i386 ABI, when the required alignment was only 4B) could violate that assumption, and it's only an extra couple instructions that run once in the life-time of the process (assuming the program doesn't call main recursively).

Unlike gcc, clang assumes the stack is properly aligned on entry to main. (clang also assumes that narrow args have been sign or zero-extended to 32 bits, even though the current ABI revision doesn't specify that behaviour (yet). gcc and clang both emit code that does in the caller side, but only clang depends on it in the callee. This happens in 64-bit code, but I didn't check 32-bit.)

Look at compiler output on for main and functions other than main if you're curious.

I just updated the ABI links in the tag wiki the other day. is still dead and seems to be not coming back, so I updated the System V links to point to the PDFs of the current revision in HJ Lu's github repo, and his page with links.