For reasons too complicated to explain here, I have the need to run a x86 GCC-compiled Linux program on a platform that is a subset of x86. This platform does not have the %gs register,
which means it has to be emulated, because GCC relies on the presence of the %gs register.
Currently I have a wrapper which catches the exceptions when the program attempts to access the %gs register, and emulates it. But this is dog slow. Is there a way that I can patch the opcodes in the ELF ahead of time with equivalent instructions, so that the trap-and-emulate is avoided?
(This is assuming Adam Rosenfields solution is not applicable. It, or a similar approach, is probably a better way to solve it.)
You haven't stated how you're emulating the %gs register, but it's probably going to be tough to patch every usage in general unless you have some special knowledge about the program, because otherwise you only have 2 bytes (in the worst, common case) you can modify with your patch. Of course, if you're using something like %es = %gs it should be relatively straight forward.
Assuming this can somehow be made to work in your case the strategy is to scan the executable sections of the ELF-file and patch any instruction that uses or modifies the GS register. That is at least the following instructions:
65expect for branch instructions in which case the prefix indicates something else)
mov r/m16, gs(
mov gs, r/m16(
mov gs, r/m64(
REX.W 8E /r) (If you support 64-bit mode)
And any others instructions that allow segment registers (I don't think that are that many more, but I'm not 100% sure).
This is all comming from Intel® 64 and IA-32 Architectures Software Developer's Manual Combined Volumes 2A and 2B: Instruction Set Reference, A-Z. Be aware that the instructions are sometimes prefixed with other prefixes, sometimes not, so you should probably use a library to do the instruction decoding rather than blindly searching for byte sequences.
Some of the above instructions should be relatively straight forward to turn into
call my_patch or similar, but you're probably going to have trouble finding something that fits in two bytes and works in general.
int XX (
CD XX) might be a good candidate if you can setup an interrupt vector, but I'm not sure it's gonna be faster than the method you're currently using. You will of course need to record which instruction was patched out and have the interrupt handler (or whatever) react differently depending on the return address (that your handler receives).
You might be able to setup a trampoline if you can find room within -128..127 bytes and use
JMP rel8 (
EB cb) to jump to the trampoline (usually another
JMP, but this time with more room for the target address), which then handles the instruction emulation and jumps back to the instruction following the patched out %gs usage.
Lastly I'd recommend keeping the trap-and-emulate code running to catch any cases you might not have thought off (self-modifying or injected code for instance). This way you can also log any unhandled cases and add them to your solution.