How does linux kernel switch between user-mode and kernel-mode stack when a system call or an interrupt appears? I mean what is the exact mechanism - what happens to user-mode stack pointer and where does kernel-mode stack pointer come from? What is done by hardware and what must be done by software?
All of the words below are about x86.
I will just describe entire syscall path, and this answer will contain requested information.
First of all, you need to understand what is interrupt descriptor table. This table stores addresses of exception/interrupts vectors. System call is an exception. To raise an exception user code perform
assembly instruction. Each exception including system call have its own number. On x86 linux this will be look like
The int instruction is a complex multi step instruction. Here is an explanation of what it does:
1.) Extracts descriptor from IDT (IDT address stored in special register) and checks that CPL <= DPL. CPL is a current privilege level, which could be read from CS register. DPL is stored in the IDT descriptor. As a consequence of this - you can't generate some exceptions (f.e. page fault) from user space directly by int instruction. If you will try to do this, you will get general protection exception
2.) The processor switches to the stack defined in TSS. TSS was initialized earlier, and already contains values of ESP and SS, which holds the kernel stack address. So now ESP points to kernel stack.
3.) The processor pushes to the newly switched kernel stack user space registers:
ss, esp, eflags, cs, eip. We need to return back after syscall is served, right?
4.) Next processor set CS and EIP from IDT descriptor. This address defines exception vector entry point.
5.) Here we are in the syscall exception vector in kernel.
And few words about ARM. ARM doesn't have TSS, it have bancked per-mode registers. So for SVC and USR modes you have separate stack pointers. If you are interested in you can take look at trap entry code