Was asked to wrap a blocking call to non-blocking during an interview today. So we (the interviewer and me) decided to achieve that by adding a background thread inside the nonblocking API. Here is the code I wrote:
30 #define ARRAY_SIZE(x) (sizeof(x)/sizeof(x[0]))
31
32 struct SensorReading records[60*10] = {{0}};
33 size_t head = 0;
34
35
36 void * worker_thread(void *arg) {
37 while (1) {
38 size_t idx = (head + 1) % ARRAY_SIZE(records);
39 records[idx] = read_next_sample();
40 head = idx;
41 }
42 }
43
44 float get_most_recent_lux() {
45 static pthread_t worker = -1;
46 if (-1 == worker) {
47 struct SensorReading r = read_next_sample(); // This is the blocking call
48 records[0] = r;
49 if (-1 == pthread_create(&worker, NULL, worker_thread, NULL)) {
50 // error handling
51 }
52 return r.lux;
53 }
54 return records[head].lux;
55 }
Let me explain a little bit here:
read_next_sample()is the blocking call provided;- line 44
get_most_recent_lux()is the wrapped non-blocking API I need to provide. - Internally, it starts a thread which executes the
worker_thread()function defined in line 36. worker_thread()keeps calling the blocking call and writes data to a ringbuf.- So the reader can read most recent record data from the ringbuf.
Also please be noted:
- This programming language used here is C, not C++.
- This is a single reader single writer case.
- This is different than producer-consumer problem, since the wrapped API
get_most_recent_lux()should always return most recent data.
Since this is a single reader single writer case, I believe:
- No lock required here.
- No atomic values required here.
(so the head in line 33 is not declared as atomic values,
and I use a normal evaluation operation (
head = idx) in line 40).
Question: Is my above statement correct?
The interviewer keeps telling me that my statement is not correct for all CPU arch, so he believes mutex or atomic variable is required here.
But I don't think so.
I believe, indeed, the single line evaluation C code (head = idx) can be translated to multiple assembly instructions, but only the last assembly instruction is for storing the updated value to the memory. So,
- before the last assembly instruction executed, the updated value has not been updated to the memory yet, so the reader will always read the old head value.
- after the last assembly instruction executed, the reader will always read the updated head value.
- in both case, it is safe, no corruption will happen.
- There is no other possibilities. Within a specified time period where only 1 write can happen (let's say change from 1 to 2), the reader can only read either 1 or 2, the reader will never read any value other than 1 or 2, like 0, 3, or 1.5.
Agree? I really can not believe there is any CPU arch that the code won't work. If there is, please educate me. Thanks very much.
You don't need any atomic RMWs or seq_cst, but you do need
_Atomicto do release-store and acquire-load to/fromhead.That only happens for free on x86 (and SPARC), not other ISAs, and is still unsafe vs. compile-time reordering even when targeting x86.
head = idx;could become visible to the other core before the updates torecords[idx], letting it read the stale values.(Well, the
records[head].luxload part will actually work on most ISAs likemo_consume, since ISAs other than DEC Alpha guarantee dependency ordering for loads.)I think there are some other similar Q&As on SO about trying to use non-atomic variables for inter-thread communication. There's little point, just use
atomic_store_explicitwithmemory_order_release- it will compile the same on x86 as a non-atomic store, but with compile time ordering guarantees. So you're not gaining efficiency by avoidingstdatomic.h, if you only use acquire and release. (Except for loads - if you want actual dependency-ordering without a barrier on weakly-ordered ISAs, you have to use relaxed under controlled conditions on weakly-ordered ISAs, because consume is currently semi-deprecated and promotes to acquire in current compilers.) See When to use volatile with multi threading? for more about why hand-rolled atomics work and why they have no advantage.Also, note that you have no protection against the queue becoming full and overwriting values that haven't been read yet. SPSC queues like this typically have the consumer side update a read index that the writer can check.