single-reader single-writer fix-sized ringbuf, without lock and atomic variable, is it always safe for any CPU arch?

Question

single-reader single-writer fix-sized ringbuf, without lock and atomic variable, is it always safe for any CPU arch?

413 views Asked by Yun Wu At 22 January 2021 at 10:20

Was asked to wrap a blocking call to non-blocking during an interview today. So we (the interviewer and me) decided to achieve that by adding a background thread inside the nonblocking API. Here is the code I wrote:

 30 #define ARRAY_SIZE(x)   (sizeof(x)/sizeof(x[0]))
 31
 32 struct SensorReading records[60*10] = {{0}};
 33 size_t head = 0;
 34 
 35
 36 void * worker_thread(void *arg) {
 37     while (1) {
 38         size_t idx = (head + 1) % ARRAY_SIZE(records);
 39         records[idx] = read_next_sample();
 40         head = idx;
 41     }
 42 }
 43
 44 float get_most_recent_lux() {
 45     static pthread_t worker = -1;
 46     if (-1 == worker) {
 47         struct SensorReading r = read_next_sample(); // This is the blocking call
 48         records[0] = r;
 49         if (-1 == pthread_create(&worker, NULL, worker_thread, NULL)) {
 50             // error handling
 51         }
 52         return r.lux;
 53     }
 54     return records[head].lux;
 55 }

Let me explain a little bit here:

read_next_sample() is the blocking call provided;
line 44 get_most_recent_lux() is the wrapped non-blocking API I need to provide.
Internally, it starts a thread which executes the worker_thread() function defined in line 36.
worker_thread() keeps calling the blocking call and writes data to a ringbuf.
So the reader can read most recent record data from the ringbuf.

Also please be noted:

This programming language used here is C, not C++.
This is a single reader single writer case.
This is different than producer-consumer problem, since the wrapped API get_most_recent_lux() should always return most recent data.

Since this is a single reader single writer case, I believe:

No lock required here.
No atomic values required here. (so the head in line 33 is not declared as atomic values, and I use a normal evaluation operation (head = idx) in line 40).

Question: Is my above statement correct?

The interviewer keeps telling me that my statement is not correct for all CPU arch, so he believes mutex or atomic variable is required here. But I don't think so. I believe, indeed, the single line evaluation C code (head = idx) can be translated to multiple assembly instructions, but only the last assembly instruction is for storing the updated value to the memory. So,

before the last assembly instruction executed, the updated value has not been updated to the memory yet, so the reader will always read the old head value.
after the last assembly instruction executed, the reader will always read the updated head value.
in both case, it is safe, no corruption will happen.
There is no other possibilities. Within a specified time period where only 1 write can happen (let's say change from 1 to 2), the reader can only read either 1 or 2, the reader will never read any value other than 1 or 2, like 0, 3, or 1.5.

Agree? I really can not believe there is any CPU arch that the code won't work. If there is, please educate me. Thanks very much.

Original Q&A

There are 1 answers

**Peter Cordes** · Answer 1 · 2021-01-22T15:21:57+00:00

You don't need any atomic RMWs or seq_cst, but you do need _Atomic to do release-store and acquire-load to/from head.

That only happens for free on x86 (and SPARC), not other ISAs, and is still unsafe vs. compile-time reordering even when targeting x86. head = idx; could become visible to the other core before the updates to records[idx], letting it read the stale values.

(Well, the records[head].lux load part will actually work on most ISAs like mo_consume, since ISAs other than DEC Alpha guarantee dependency ordering for loads.)

I think there are some other similar Q&As on SO about trying to use non-atomic variables for inter-thread communication. There's little point, just use atomic_store_explicit with memory_order_release - it will compile the same on x86 as a non-atomic store, but with compile time ordering guarantees. So you're not gaining efficiency by avoiding stdatomic.h, if you only use acquire and release. (Except for loads - if you want actual dependency-ordering without a barrier on weakly-ordered ISAs, you have to use relaxed under controlled conditions on weakly-ordered ISAs, because consume is currently semi-deprecated and promotes to acquire in current compilers.) See When to use volatile with multi threading? for more about why hand-rolled atomics work and why they have no advantage.

Also, note that you have no protection against the queue becoming full and overwriting values that haven't been read yet. SPSC queues like this typically have the consumer side update a read index that the writer can check.

TechQA.

single-reader single-writer fix-sized ringbuf, without lock and atomic variable, is it always safe for any CPU arch?

There are 1 answers

Related Questions in C

Related Questions in MULTITHREADING

Related Questions in LOCK-FREE

Related Questions in CIRCULAR-BUFFER

Related Questions in ATOMIC-VALUES

Popular Questions

Trending Questions