Will a stream reader yield return items before the previous has been handled?

232 views Asked by At

I have a c# method that is reading a large report that ends up eating up memory now and then. I am not sure how the iteration works with yield returns to determine whether it is the problem. The snippet below shows the potential problem. Once we call verySlowMethodCall(item) it will begin streaming from the file. Will the second line of the file be read after we have finished with the first or will the code just keep reading the file as fast as possible resulting in lots of memory being used up if we aren't processing the data and disposing of the objects fast enough?

public IEnumerable<SomeObject> ParseReport(string reportPath)
{
    using (var file = new StreamReader(reportPath))
    {
        while (!file.EndOfStream)
        {
            yield return JsonConvert.DeserializeObject<SomeObject>(file.ReadLine());
        }
    }
}
foreach(var item in ParseReport("file/path.txt"))
{
    var verySlowMethodCall(item);
} 

So, say for example the first call to verySlowMethodCall(item) takes 30 seconds to do its thing. Will we only stream the second line from the file once the loop calls verySlowMethodCall(item) for a second time or could we have 1000s of rows read into memory by the time the first call to verySlowMethodCall returns?

1

There are 1 answers

0
JonasH On

Your code would be conceptually equal to

using (var file = new StreamReader("file/path.txt"))
{
     while (!file.EndOfStream)
     {
         var item = JsonConvert.DeserializeObject<SomeObject>(file.ReadLine());
         var verySlowMethodCall(item);
      }
}

It will only read the next line with the next item is requested. It cannot "read ahead". This lazy evaluation is kind of the whole point with iterator blocks.

You could write it to have the reading and processing being done in parallel, and that might help slightly with performance. But this require much more complicated code, with multiple threads/tasks, and a queue to move objects between the threads. Or the use of a library, like DataFlow, that can hide some of the complexity of such a solution.

Note that the actual file is most likely buffered, so the OS will probably read a much larger chunk of data than is requested. This could possibly include the rest of the file if you have the memory available.