Alex D Alex D - 5 months ago 49
Linux Question

Why does Linux save %ebp when doing a context switch?

When doing a context switch, x86 Linux (very cleverly) avoids saving and restoring EAX, EBX, ECX, EDX, ESI, and EDI. Of course, the userland values are saved on the kernel stack when switching into kernel mode. But the values in the kernel code are not saved -- instead, GCC directives are used which tell the compiler not to keep any values which are needed in those registers at the point where the switch happens.

Naturally, ESP has to be saved and restored. But this is what I don't understand: before ESP is switched, EBP is pushed on the kernel stack. I would think that EBP was being used as a frame pointer, but in my kernel debugger, the values sure don't look like it:

(gdb) print $esp
$22 = (void *) 0xc0025ec0
(gdb) print $ebp
$23 = (void *) 0xcf827f3c

The difference is way too big for EBP to be a frame pointer here. A comment in the code says that "EBP is saved/restored explicitly for wchan access", but I'm searching the code and can't figure out how that is so. Google isn't helping either. Can some kernel wizard step in and help here?

Answer Source

I have figured it out. EBP is a frame pointer, but at the point that I checked its value, ESP had already been switched to the new process' kernel stack, but EBP had not yet been restored (so it still had the value from the previous process). Sorry!!

The reason for storing the frame pointer is so that others can determine where in the kernel code a process went to sleep. Among other things, this is used by /proc/PID/wchan, which prints the name of the kernel function which made a process sleep.

The code which checks this is as follows (details removed for brevity):

unsigned long get_wchan(struct task_struct *p)
    unsigned long sp, bp, ip;
    sp = p->thread.sp;
    bp = *(unsigned long *) sp;
    do {
        ip = *(unsigned long *) (bp+4);
        if (!in_sched_functions(ip))
            return ip;
        bp = *(unsigned long *) bp;
    } while (count++ < 16);
    return 0;

Since EBP is pushed right before switching kernel stacks, the stack pointer of a sleeping process will point to the saved EBP (frame pointer) value. That frame pointer points to the caller's saved frame pointer, which points to the previous caller's, which points to the previous caller's... in other words, the saved frame pointers form a linked list going back up the call stack.

The frame pointer is saved immediately on function entry, so the value just above it (4 bytes up) is the return address to the calling function.

The loop in get_wchan walks that "linked list" (bp = *bp), checking the return address above each saved frame pointer, until it finds an address within a function like ep_poll or futex_wait_queue_me.

get_wchan just returns an address inside a function; for display in /proc, lookup_symbol_name is used to convert that address into a function name.

Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download