Why is this multithreading code running slower than single thread?

Question

Why is this multithreading code running slower than single thread?

178 views Asked by Charles Boman At 31 July 2023 at 03:43

enter image description here

Hi,

I'm doing my first attempts at multithreading in c# within grasshopper. The function is a simple distance between two points function, with two inputs of 10,000 points. The code is working but its taking a much much longer time than a single threaded component. I'm curious if anyone could explain why this is.

using System;
using System.Collections;
using System.Collections.Generic;

using Rhino;
using Rhino.Geometry;

using Grasshopper;
using Grasshopper.Kernel;
using Grasshopper.Kernel.Data;
using Grasshopper.Kernel.Types;

using System.Threading;
using System.Threading.Tasks;
using System.Collections.Concurrent;


  private void RunScript(List<Point3d> x, List<Point3d> y, ref object A)
  {

    var results = new ConcurrentDictionary<Point3d, double>(Environment.ProcessorCount, x.Count);

    List<double> output = new List<double>();
    foreach(Point3d pt in x) results[pt] = 0;


    Parallel.ForEach(x, pt =>
      {
      results[pt] = pt.DistanceTo(y[x.IndexOf(pt)]);
      });

    foreach(Point3d pt in x)
    {
      output.Add(results[pt]);
    }

    A = output;

I know there must be heavy overheads with using concurrentdictionary, two loops and list but I assumed the higher the population of points, the more you would see a benefit over a single thread, which isn't the case. If I'm using a really inefficient method I'd like to know, or if for such a simple task there's little chance the overheads will ever justify multi-threading?

Original Q&A

There are 2 answers

**Johan Donne** · Answer 1 · 2023-07-31T05:53:56+00:00

There are a few things you can do to optimize your code, two of which I tried with your code:

Limit the number of threads you are using (using ParallelOptions). As the workload for each iteration is very small, this will limit the overhead of the parallelization.
Use Parallel.For instead of Parallel.ForEach so you don't need to lookup the index for each point. This has nothing to do with parallelization but just reduces the workload in a dramatical manner.
Create output from result in one go instead of adding the results one by one.

Implementing these in your method could give something like:

private void RunScriptOpt(List<Point3D> x, List<Point3D> y, ref List<double> A)
{
    var results = new ConcurrentDictionary<Point3D, double>(Environment.ProcessorCount, x.Count);
    List<double> output = new List<double>();
    Parallel.For(0, x.Count, new ParallelOptions { MaxDegreeOfParallelism = 4 }, (i) =>
    {            
        results[x[i]] = x[i].DistanceTo(y[i]);
    });       
    
    output = results.Values.ToList();

    A = output;
}

Benchmarkresults for both:

Method	Mean	Error	StdDev	Gen0	Gen1	Gen2	Allocated
NaiveParallel	105,161.0 us	2,039.82 us	2,652.35 us	159600.0000	1000.0000	800.0000	1909.71 MB
OptParallel	736.1 us	4.65 us	4.12 us	99.6094	98.6328	49.8047	1.18 MB

Executing time is reduced to 0.7% of the original and memory allocation to 0.06% (reducing load on the garbage collector a lot).

Obviously this optimized version will compare much more favorably to the single-threaded code than the original method...

I should clarify I'm trying to output the distances so they associate with the order of the points going in,

Accomplishing this is even more simple. You don't need a ConcurrentDictionary for this. Since each iteration will calculate and store the result completely independent from the other iterations, no synchronization is required and you can use a simple array:

private void RunScriptOpt(List<Point3D> x, List<Point3D> y, ref double[] A)
{
    double[] results = new double[x.Count];
    Parallel.For(0, x.Count, new ParallelOptions { MaxDegreeOfParallelism = 10 }, (i) =>
    {            
        results[i] = x[i].DistanceTo(y[i]);
    });                       
    
    A = results;
}

Of course this will make your code still faster (and reduce the memory allocations even further):

Method	Mean	Error	StdDev	Gen0	Gen1	Gen2	Allocated
NaiveParallel	143,491.07 us	9,306.979 us	27,441.839 us	159600.0000	32800.0000	600.0000	1955547.13 KB
OptParallel	27.47 us	0.203 us	0.190 us	6.6528	2.2888	-	81.42 KB

**Guru Stron** · Answer 2 · 2023-07-31T06:53:03+00:00

I would argue that the biggest issue in the code is the use of IndexOf which makes your code effectively O(n^2) so I would start from testing out the single-threaded approach. If you think about your data manipulation happening here - you are matching incoming collection by index and computing the result, which is what the Zip is for, so try something like (add using System.Linq;):

private void RunScript(List<Point3d> x, List<Point3d> y, ref object A)
{
    // if needed - check that both sequences are of the same size
    A = x.Zip(y, (l, r) => l.DistanceTo(r))
        .ToArray();
}

Note that Zip will produce result with size of the shortest collection:

The method merges each element of the first sequence with an element that has the same index in the second sequence. If the sequences do not have the same number of elements, the method merges sequences until it reaches the end of one of them. For example, if one sequence has three elements and the other one has four, the result sequence will have only three elements.

If you have really that much items you can try using PLINQ before going deep into different kinds of optimizations:

private void RunScript(List<Point3d> x, List<Point3d> y, ref object A)
{
    // if needed - check that both sequences are of the same size
    A = x
        .AsParallel()
        .Zip(y, (l, r) => l.DistanceTo(r))
        .ToArray();
}

If this method is a part of critical path and you want to squeeze as much performance as possible then you can go with the 3rd option from the another answer - preallocate the resulting array of needed size and fill it (no need for concurrent collections here because there would be no contetntion/race guarantees - every read/write will for different index). Or similar (preallocate results array and use for loop) single-threaded approach (parallelization usually make sense either for big inputs sizes or relatively long computations performed).

P.S.

Use BenchmarkDotNet for benchmarking, but note that in general there are a lot of moving parts and same code can behave differently based on number of factors (.NET version, OS, CPU architecture ...)

TechQA.

Why is this multithreading code running slower than single thread?

There are 2 answers

Related Questions in C#

Related Questions in MULTITHREADING

Related Questions in GRASSHOPPER

Popular Questions

Trending Questions