Chris Cochran Chris Cochran - 2 months ago 7
C++ Question

Does an aborted xbegin transaction restore the stack context that existed at the xbegin start?

I am interested in encapsulating a transactional xbegin and xend inside XBEGIN( ) and XEND( ) functions, in a static assembler lib. However I am unclear how (or if) the stack gets restored to the original xbegin calling state, given an xabort originating at some other stack level (higher or lower). In other words, is the dynamic stack context (including interrupts effects) managed and rolled back as just another part of the transaction?

This assembler approach is needed for a VC++ 2010 build that doesn't have _xbegin( ) and _xend( ) intrinsics supported or available, and x64 builds cannot use _asm { } inlining.


The Intel insn ref manual entry for xbegin is pretty clear. (See the tag wiki for links to Intel's official PDF, and other stuff.)

On an RTM abort, the logical processor discards all architectural register and memory updates performed during the RTM execution and restores architectural state to that corresponding to the outermost XBEGIN instruction. The fallback address following an abort is computed from the outermost XBEGIN instruction.

So the instruction works like a conditional branch, where the branch condition is "did an abort happen before XEND?" e.g.:

; NASM syntax, I assume MASM is similar
    ; eax holds abort info, all other architectural state + memory is unchanged
    inc     [retry_count]      ; or whatever other debug instrumentation you want to add

global xbegin_wrapper_with_retry
    xbegin  retry

If an abort happens, it's as if all the code that ran after xbegin didn't run at all, just a jump to the fallback address with eax modified.

You might want to do something other than just infinite retries on an abort, of course. This isn't meant to be a real example. (This article does have a real example of the kind of logic you might want to use, using intrinsics. It looks like they just test eax instead of using the xbegin as the jump in an if, unless the compiler optimizes that check. IDK if it's the most efficient way.)

What do you mean "interrupts effects"?

You might want to try to get the compiler to emit the three-byte XEND instruction without a function call, so pushing the return address onto the stack isn't part of the transaction. e.g.

// no idea if this is safe, or if it might get reordered by the optimizer
#define xend_MSVC  __asm _emit 0x0F  __asm _emit   0x01 __asm _emit 0xD5

I think this does still work in 64bit mode, since the doc mentions rax, and it looks like IACA's header file uses __asm _emit.

It'll be safer to put XEND in its own wrapper function, too, I guess. You just need a stop-gap until you can upgrade to a compiler with intrinsics, so it doesn't have to be perfect as long as the extra reads/writes from the ret and call don't cause too many aborts.