Consider the following code:
var channel = Channel.CreateUnbounded<int>(new UnboundedChannelOptions());
var t1 = Task.Run(async () =>
{
DateTime start = DateTime.Now;
for (int i = 0; i < 100000000; i++)
{
await channel.Writer.WriteAsync(i);
}
Console.WriteLine($"Writer took {DateTime.Now - start}");
channel.Writer.Complete();
});
var t2 = Task.Run(async () =>
{
while (true)
{
try
{
int r = await channel.Reader.ReadAsync();
}
catch (ChannelClosedException) { break; }
}
});
await Task.WhenAll(t1, t2);
This takes about 10 seconds i.e. outputs something like "Writer took 00:00:10.276747". If I comment out the whole while block, it takes about 6 seconds. This is pretty consistent over multiple runs.
Question: if Channel is supposed to be an efficient producer/consumer mechanism, why does consuming in this case affect the producer?
More curiously, if I add these two methods:
static async Task Produce(Channel<int> channel)
{
DateTime start = DateTime.Now;
for (int i = 0; i < 100000000; i++)
{
await channel.Writer.WriteAsync(i);
}
Console.WriteLine($"Writer took {DateTime.Now - start}");
channel.Writer.Complete();
}
static async Task Consume(Channel<int> channel)
{
while (true)
{
try
{
int r = await channel.Reader.ReadAsync();
}
catch (ChannelClosedException) { break; }
}
}
and then do:
var t1 = Produce(channel);
var t2 = Consume(channel);
await Task.WhenAll(t1, t2);
They finish in around 6 seconds either way (while block uncommented vs commented).
Question: Why does involving an explicit thread with Task.Run affect the efficiency?
This is an interesting question but not because of any lack of efficiency. In fact, the question's numbers shows channels are very efficient. Writing to an unbounded channel involves:
This means that enqueuing and waking a reader only takes 66% more than simply enqueueing into a ConcurrentQueue. That's not bad at all. Unfortunately, that number is deceptive, especially in this case, where a Task or ValueTask is larger than the
intpayload and the "work" is negligible.Benchmark libraries like BenchmarkDotNet run tests multiple times until they can get a statistically stable sample, with warmup and cooldown steps to account for JIT, caching and warmup effects.
To get a baseline, I used BenchmarkDotnet with this benchmark class. I couldn't resist adding a parameter for the
SingleReaderoptimization which assumes there can be only a single reader at a time, so uses a simpler queue and locking.And got a big surprise, that should have been expected:
With values
Completed Work Itemsis the number of tasks completed in the ThreadPool. The benchmarks with methods don't use the ThreadPool at all. Which of course they don't since they don't use Task.Run! The code that uses methods doesn't use multiple threads so there are no lock conflicts. Same with the code that has no producers.This means the benchmarks can't be compared. Even so, it's obvious that using
SingleReaderuses less memoryThe entire benchmark with the 100M items took 28 minutes, so I'll wait for a bit before creating a new, correct benchmark with far fewer items