mrn mrn - 1 year ago 122
Linux Question

How does linux kernel switch between user-mode and kernel-mode stack?

How does linux kernel switch between user-mode and kernel-mode stack when a system call or an interrupt appears? I mean what is the exact mechanism - what happens to user-mode stack pointer and where does kernel-mode stack pointer come from? What is done by hardware and what must be done by software?

Answer Source

All of the words below are about x86.

I will just describe entire syscall path, and this answer will contain requested information.

First of all, you need to understand what is interrupt descriptor table. This table stores addresses of exception/interrupts vectors. System call is an exception. To raise an exception user code perform

int x

assembly instruction. Each exception including system call have its own number. On x86 linux this will be look like

int 0x80

The int instruction is a complex multi step instruction. Here is an explanation of what it does:

1.) Extracts descriptor from IDT (IDT address stored in special register) and checks that CPL <= DPL. CPL is a current privilege level, which could be read from CS register. DPL is stored in the IDT descriptor. As a consequence of this - you can't generate some exceptions (f.e. page fault) from user space directly by int instruction. If you will try to do this, you will get general protection exception

2.) The processor switches to the stack defined in TSS. TSS was initialized earlier, and already contains values of ESP and SS, which holds the kernel stack address. So now ESP points to kernel stack.

3.) The processor pushes to the newly switched kernel stack user space registers: ss, esp, eflags, cs, eip. We need to return back after syscall is served, right?

4.) Next processor set CS and EIP from IDT descriptor. This address defines exception vector entry point.

5.) Here we are in the syscall exception vector in kernel.

And few words about ARM. ARM doesn't have TSS, it have bancked per-mode registers. So for SVC and USR modes you have separate stack pointers. If you are interested in you can take look at trap entry code

Interestring links: MIT JOS lab 3 , XV6 manual

Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download