Dimitar Dimitrov Dimitar Dimitrov - 6 months ago 24
Java Question

Implementing an acquire for a release from Unsafe.putOrdered*()?

What do you think is the best correct way for implementing the acquire part of a release/acquire pair in Java?

I'm trying to model some of the actions in an application of mine using classic release/acquire semantics (without

StoreLoad
and without sequential consistency across threads).

There are a couple of ways to achieve the rough equivalent of a store-release in the JDK.
java.util.concurrent.Atomic*.lazySet()
and the underlying
sun.misc.Unsafe.putOrdered*()
are the most often cited approaches to do that. However there's no obvious way to implement a load-acquire.


  • The JDK APIs which allow
    lazySet()
    mostly use
    volatile
    variables internally, so their store-releases are paired with volatile loads. In theory volatile loads should be more expensive than load-acquires, and should not provide anything more than a pure load-acquire in the context of a preceding store-release.

  • sun.misc.Unsafe
    does not provide
    getAcquire()*
    equivalents of the
    putOrdered*()
    methods, even though such acquire methods are planned for the upcoming VarHandles API.

  • Something that sounds like it would work is a plain load, followed by
    sun.misc.Unsafe.loadFence()
    . It's somewhat disconcerting that I haven't seen this anywhere else. This may be related to the fact that it's a pretty ugly hack.



P.S. I understand well that these mechanisms are not covered by the JMM, that they are not sufficient for maintaining sequential consistency, and that the actions they create are not synchronization actions (e.g. I understand that they for example break IRIW). I also understand that the store-releases provided by
Atomic*/Unsafe
are most often used either for eagerly nulling out references or in producer/consumer scenarios, as an optimized message passing mechanism for some important index.

Answer

Volatile read is exactly what you are looking for.

In fact, corresponding volatile operations already have release/acquire semantics (otherwise happens-before is not possible for paired volatile write-read), but paired volatile operations should not only be sequentially consistent (~happens-before), but also they should be in total synchronization order, thats why StoreLoad barrier is inserted after volatile write: to guarantee total order of volatile writes to different locations, so all threads will see those values in the same order.

Volatile read has acquire semantics: proof from hotspot codebase, also there is direct recommendation by Doug Lea in JSR-133 cookbook (LoadLoad and LoadStore barriers after each volatile read).

Unsafe.loadFence() also has acquire semantics (proof), but used not to read value (you can do the same with plain volatile read), but to prevent reorder plain reads with subsequent volatile read. This is used in StampedLock for optimistic reading (see StampedLock#validate method implementation and usages).

Update after discussion in comments.

Let's check if Unsafe#loadStore() and volatile read are the same and have acquire semantics.

I'm looking at hotspot C1 compiler source code to avoid reading through all the optimizations in C2. It transforms bytecode (in fact, not bytecode, but its interpreter representation) into LIR (Low-Level Intermediate Representation) and then translates graph to actual opcodes depends on target microarchitecture.

Unsafe#loadFence is intrinsic which has _loadFence alias. In C1 LIR generator it generates this:

case vmIntrinsics::_loadFence :
if (os::is_MP()) __ membar_acquire();

where __ is macros for LIR generation.

Now let's look at volatile read implementation in the same LIR generator. It tries to insert null checks, checks IRIW, checks if we are on x32 and trying to read 64-bit value (to make some magic with SSE/FPU) and, finally, leads us to the same code:

if (is_volatile && os::is_MP()) {
    __ membar_acquire();
}

Assembler generator then inserts platform-specific acquire instruction(s) here.

Looking at specific implementations (no links here, but all can be found in src/cpu/{$cpu_model}/vm/c1_LIRAssembler_{$cpu_model}.cpp)

  • SPARC

    void LIR_Assembler::membar_acquire() {
        // no-op on TSO
    }
    
  • x86

    void LIR_Assembler::membar_acquire() {
        // No x86 machines currently require load fences
    }
    
  • Aarch64 (weak memory model, barriers should be present)

    void LIR_Assembler::membar_acquire() {
        __ membar(Assembler::LoadLoad|Assembler::LoadStore);
    }
    

    According to aarch architecture description such membar will be compiled as dmb ishld instruction after load.

  • PowerPC (also weak memory model)

    void LIR_Assembler::membar_acquire() {
        __ acquire();
    }
    

    which then transforms into specific PowerPC instruction lwsync. According to the comments lwsync is semantically equivalent to

    lwsync orders Store|Store, Load|Store, Load|Load, but not Store|Load

    But as long as PowerPC hasn't any weaker barriers, this is the only choice to implement acquire semantics on PowerPC.

Conclusions

Volatile reads and Unsafe#loadFence() are equal in terms of memory ordering (but maybe not in terms of possible compiler optimizations), on most popular x86 it's no-op, and PowerPC is the only supported architecture with has no precise acquire barriers.