Austin Copeland Austin Copeland - 1 month ago 14
C Question

Beginner's confusion about x86 stack

I'm a 17-year-old highschool student that wanted to learn how to hack, but then found out that he really likes learning about x86, yesterday.
With that said, I have no formal training and I'm just going off of what I'm reading on the internet.
Here is an image I drew in reference to my question:



First of all, I'd like to know if this model is an accurate representation of the stack "framing" process.

I know that conceptually, the stack is like a Coke bottle. The sugar is at the bottom and you fill it up to the top. With this in mind, how does the Call tell the EIP register to "target" the called function if the EIP is in another bottle (it's in the code segment, not the stack segment)? I watched a video on YouTube saying that the "Code Segment of RAM" (the place where functions are kept) is the place where the EIP register is.

I'm lost.

Answer

Typically, a computer program uses four kinds of memory areas (also called sections or segments):

  • The text section: This contains the program code. It is reserved when the program is loaded by the operating system. This area is fixed and does not change while the program is running. This would better be called "code" section, but the name has historical reasons.
  • The data section: This contains variables of the program. It is reserved when the program is loaded and initialized to values defined by the programmer. These values can be altered by the program while it executes.
  • The stack: This is a dynamic area of memory. It is used to store data for function calls. It basically works by "pushing" values onto the stack and popping from the stack. This is also called "LIFO": last in first out. This is where local variables of a function reside. If a function complets, the data is removed from the stack and is lost (basically).
  • The heap: This is also a dynamic memory region. There are special function in the programming language which "allocate" (reserve) a piece of this area on request of the program. Another function is available to return this area to the heap if it is not required anymore. As the data is released explicitly, it can be used to store data which lives longer than just a function call (different from the stack).

The data for text and data section are stored in the program file (they can be found in Linux for example using objdump (add a . to the names). stack and heap are not stored anywhere in the file as they are allocated dynamically (on-demand) by the program itself.

Normally, after the program has been loaded, the memory area reamining is treated as a single large block where both, stack and heap are located. They start from opposite end of that area and grow towards each other. For most architectures the heap grows from low to high memory addresses (ascending) and the stack downwards (decending). If they ever intersect, the program has run out of memory. As this may happen undetected, the stack might corrupt (change foreign data) the heap or vice versa. This may result in any kind of errors, depending how/what data has changed. If the stack gets corrupted, this may result in the program going wild (this is actually one way a trojan might work). Modern operating systems, however should take measures to detect this situation before it becomes critical.

This is not only for x86, but also for most other CPU families and operating system, notably: ARM, x86, MIPS, MSP430 (microcontroller), AVR (microcontroller), Linux, Windows, OS-X, iOS, Android (which uses Linux OS), DOS. For microcontrollers, there is often no heap (all memory is allocated at run-time) and the stack may be organized a bit differently; this is also true for the ARM-based Cortex-M microcontrollers. But anyway, this is quite a special subject.


Disclaimer: This is very simplified, so please no comments like "how about bss, const, myspecialarea";-) . There also is not requirement from the C standard for these areas, specifically to use a heap or a stack. Indeed there are implementations which don't use either. Those are most times embedded systems with small (8 or 16 bit) MCUs or DSPs. Also modern architectures use CPU registers instead of the stack to pass parameters and keep local variables. Those are defined in the Application Binary Interface of the target platform.


For the stack, you might read the wikipedia article. Note the difference in implementation between the datatstructure "stack" and the "hardware stack" as implemented in a typical (micro)processor.