Many programming languages today have happens-before relation and release+acquire synchronization operations.
Some of these programming languages:
- C/C++11: happens-before, release+acquire
- Rust and Swift adopted the C/C++ memory model in its entirety — so they have that too.
- Java: happens-before, release+acquire.
I would like to know if release+acquire can violate happens-before:
- if it's possible, then I would like to see an example
- if it's impossible, then I would like to get simple and clear explanations why
What is release+acquire and happens-before
Release/acquire establishes happens-before relation between different threads: in other words everything before release in Thread 1 is guaranteed to be visible in Thread 2 after acquire:
\ Thread 1 /
\ -------- /
\ x = 1 / Everything
\ y = 2 / here...
\ write-release(ready = true) /
└───────────────────────────┘
|
└─────────────┐ (happens-before)
V
┌─────────────────────────┐
/ Thread 2 \
...is visible to / -------- \
everything / read-acquire(ready == true) \
here / assert(x == 1) \
/ assert(y == 2) \
More than that, happens-before is a strict partial order.
This means it is:
- transitive:
Thread 2is guaranteed to see not only writes made byThread 1, but also all writes by other threads thatThread 1saw - asymmetric: either
ahappens-beforeb, orbhappens-beforea, both isn't allowed
Why I think that release/acquire might break happens-before
As we know from IRIW litmus test, release/acquire could cause two threads to see writes from different threads in different order (for C++ see also the last example here, and these two examples from gcc wiki):
// Thread 1
x.store(1, memory_order_release);
// Thread 2
y.store(1, memory_order_release);
// Thread 3
assert(x.load(memory_order_acquire) == 1 && y.load(memory_order_acquire) == 0)
// Thread 4
assert(y.load(memory_order_acquire) == 1 && x.load(memory_order_acquire) == 0)
Here both asserts can pass, which means that Thread 3 and Thread 4 see writes to x and y in different order.
As I understand, if it were ordinary variables, then this would violate the asymmetry property of happens-before.
But because x and y are atomics it's OK.
(BTW I am not sure about that)
Nate Eldredge demonstrated in his answer that this IRIW example is OK.
But I still have a sneaking suspicion that there might be something similar to IRIW which would cause Thread 3 and Thread 4 to see regular writes to happen-before in different order — this would break happens-before (it wouldn't be transitive anymore).
Note1
In cppreference there is also this quote:
The implementation is required to ensure that the happens-before relation is acyclic, by introducing additional synchronization if necessary (it can only be necessary if a consume operation is involved, see Batty et al)
The quote hints that there might be cases when happens-before is violated and which require additional synchronization ("acyclic" means that happens-before forms a directed acyclic graph, which is equivalent to saying "strict partial order").
If it's possible I would like to know what are these cases.
Note2
Since java allows data races, I'm also interested in cases when happens-before is violated only in presence of data races.
Edit 1 (03 Nov 2021)
To give you an example, here is a explanation why sequentially consistent (SC) atomics cannot violate happens-before.
(A similar explanation for release/acquire atomics would be an answer to my question).
By "violate happens-before" I mean "to violate axioms of happens-before, which is a strict partial order".
Strict partial orders correspond directly to directed acyclic graphs (DAGs).
Here is an example of a DAG from wiki (note that it has no cycles):
Let's show that with SC atomics happens-before graph stays acyclic.
Remember that SC atomics happen in a global order (the same for all threads), and:
- the order is consistent with the order of actions within each thread
- every SC atomic read sees the latest SC atomic write in this total order to the same variable
Look at this happens-before graph:
Thread1 Thread2 Thread3
======= ======= =======
│ │ │
W(x) │ │
↓ │ │
Sw(a) ┐ │ W(y)
│ │ │ ↓
│ └> Sr(a) ┌ Sw(b)
│ ↓ │ │
│ Sr(b)<─┘ │
│ ↓ │
│ R(x) │
│ ↓ │
│ R(y) │
│ │ │
V V V
On the graph:
- time flows downwards
W(x)andR(x)are regular actions: write and read ofxSw(a)andSr(a)are SC atomics: write and read ofa- within each thread actions happen in a program order (aka
sequenced-before orderin C++): in the order they go in the code - between threads
happens-beforeis established by SC atomics
Note that arrows on the graph always go downwards
=> the graph cannot have cycles
=> it's always a DAG
=> happens-before axioms cannot be violated
The same proof doesn't work for release/acquire atomics because (as far as I understand) they don't happen in a global order => a HB arrow between Sw(a) and Sr(a) might go upwards => a cycle is possible. (I'm not sure about that)
Answer: no, release/acquire cannot break happens-before.
The proof is given by Nate Eldredge in this comment:
I just would like to put it in a separate answer to make it easier for people to notice.
Plus here are additional explanations from me (maybe it will make it easier to understand for some people).
First of all, if we use only
release/acquireatomics and non-atomic memory accesses, thenhappens-beforeis transitive (and acyclic) by construction.Similarly to
SCgraph, in case ofrelease/acquireedges also always point downwards:(BTW notice that this graph is different from
SC:Thread 1andThread 4seeWrel(a)andWrel(b)in different orders. But edges point downwards nonetheless)Edges from
Wrel(x)toRacq(x)always point downwards becauseRacq(x)seesWrel(x)and everything beforeWrel(x)as completed (See Notes in the end of the answer).(In C++ spec this is called
synchronizes-with, you can learn more about it here.)As a result, similarly to
SCgraph, all edges always point downwards=> the graph cannot have cycles
=> it's always a DAG
=>
happens-beforeaxioms cannot be violatedActually
happens-beforeproduced byrelease/acquireatomics — is basically the originalhappens-beforeintroduces by Leslie Lamport in Time, Clocks and the Ordering of Events in a Distributed System.I really recommend to read the paper to everyone interested in
HB— Lamport's explanations are short and clear, and the idea is really cool.Let's also demonstrate with picture why cycles aren't possible.
This a what a cycle looks like:
Within each thread
happens-beforeis the order of actions in the source code (it' calledsequenced-beforein C++ andprogram orderin Java).Clearly, no
HBcycles are possible here.It means that the edge which "goes back" and closes the cycle must be a
synchronizes-withedge like theWrel(a)->Racq(a)edge in the graph above.Notice a contradiction:
Wrel(a)must complete beforeRacq(a), becauseRacq(a)reads the value written byWrel(a)Racq(a)must complete beforeWrel(a), because there is a chainRacq(a)->Wrel(b)->Racq(b)->Wrel(c)->Racq(c)->Wrel(a)where everyWrel(x)(and everything before it) completes beforeRacq(x)reads it.This means the
Wrel(a)->Racq(a)edge is not allowed => cycles aren't possible.In terms of the C++ memory model it violates coherence requirements:
Notes.
I stated that:
but in C++ standard that isn't stated directly.
Instead it has this:
happens-beforeis what defines what a read seesRacq(x)andWrel(x)is calledsynchronizes-withhappens-beforeis built fromsynchronizes-withand huge amount of other rules and ordersIt's possible to deduce my statement from the C++ standard, although it might not be easy. (That is why I recommend to read this article instead).
I used the statement because it concisely describes what memory barrier instructions do, and this is how
happens-beforecan be (and probably is) easily implemented .So often a memory barrier instruction is all we need to implement
happens-beforewith all its mathematical properties.For an overview of these instructions on different CPUs I would recommend the section on that in Memory Barriers: a Hardware View for Software Hackers by Paul E. McKenney.
(For example, memory barriers in PowerPC basically work the same as
release/acquireatomics in C++)