On the line of Read large txt file multithreaded?, I have the doubt of whether it is equivalent to pass to each thread an sliced chunk of a Seq and whether it will safely handle the paralellism; is it StreamReader thread-safe?
Here is the code I am using to test this (any advice or critics on the used pattern is welcome :) )
nthreads = 4    
let Data = seq {
        use sr = new System.IO.StreamReader (filePath)
        while not sr.EndOfStream do
            yield sr.ReadLine ()
        }
let length = (Data |> Seq.length)
let packSize = length / nthreads
let groups =
     [ for i in 0..(nthreads - 1) -> if i < nthreads - 1  then Data |> Seq.skip( packSize * i )
                                                                    |> Seq.take( packSize )
                                                          else Data |> Seq.skip( packSize * i ) ]
let f = some_complex_function_modifiying_data
seq{ for a in groups -> f a }
        |> Async.Parallel
        |> Async.RunSynchronously
				
                        
Your
Datavalue has a typeseq<string>, which means that it is lazy. This means that when you perform some computation that accesses it, the lazy sequence will create a new instance ofStreamReaderand read the data independently of other computations.You can easily see this when you add some printing to the
seq { .. }block:As a result, your parallel processing is actually fine. It will create a new computation for every single parallel thread and so the
StreamReaderinstances are never shared.Another question is if this is actually a useful thing to do - reading data from disk is often a bottle neck and so it might be faster to just do things in one loop. Even if this works, using
Seq.lengthis a slow way to get the length (because it needs to read the whole file) and the same forskip. A better (but more complex) solution would probably be to use streamSeek.