
Hi,
I'm doing my first attempts at multithreading in c# within grasshopper. The function is a simple distance between two points function, with two inputs of 10,000 points. The code is working but its taking a much much longer time than a single threaded component. I'm curious if anyone could explain why this is.
using System;
using System.Collections;
using System.Collections.Generic;
using Rhino;
using Rhino.Geometry;
using Grasshopper;
using Grasshopper.Kernel;
using Grasshopper.Kernel.Data;
using Grasshopper.Kernel.Types;
using System.Threading;
using System.Threading.Tasks;
using System.Collections.Concurrent;
private void RunScript(List<Point3d> x, List<Point3d> y, ref object A)
{
var results = new ConcurrentDictionary<Point3d, double>(Environment.ProcessorCount, x.Count);
List<double> output = new List<double>();
foreach(Point3d pt in x) results[pt] = 0;
Parallel.ForEach(x, pt =>
{
results[pt] = pt.DistanceTo(y[x.IndexOf(pt)]);
});
foreach(Point3d pt in x)
{
output.Add(results[pt]);
}
A = output;
I know there must be heavy overheads with using concurrentdictionary, two loops and list but I assumed the higher the population of points, the more you would see a benefit over a single thread, which isn't the case. If I'm using a really inefficient method I'd like to know, or if for such a simple task there's little chance the overheads will ever justify multi-threading?
There are a few things you can do to optimize your code, two of which I tried with your code:
ParallelOptions). As the workload for each iteration is very small, this will limit the overhead of the parallelization.Parallel.Forinstead ofParallel.ForEachso you don't need to lookup the index for each point. This has nothing to do with parallelization but just reduces the workload in a dramatical manner.outputfromresultin one go instead of adding the results one by one.Implementing these in your method could give something like:
Benchmarkresults for both:
Executing time is reduced to 0.7% of the original and memory allocation to 0.06% (reducing load on the garbage collector a lot).
Obviously this optimized version will compare much more favorably to the single-threaded code than the original method...
Accomplishing this is even more simple. You don't need a ConcurrentDictionary for this. Since each iteration will calculate and store the result completely independent from the other iterations, no synchronization is required and you can use a simple array:
Of course this will make your code still faster (and reduce the memory allocations even further):