Long-running parallel Tasks with Entity Framework cause high CPU peak and memory usage

734 views Asked by At

I am shifting a C# ASP.NET Core 7 project from using SqlClient with regular SQL queries to using Entity Framework instead. I have a particular place when the application runs multiple long-running tasks, it's kind of a simulation with a big for loop where the user can follow progress, and for that reason, each task writes into the database dozens of times in its own task. The old SqlClient solution worked smoothly with minimal CPU and memory usage, but with EF, once the threads are beginning to work, everything halts and freezes.

I know that DbContext is not thread-safe, therefore each task creates its own DbContext, and they create it, particularly where the database inserts occur, and I dispose them right away once they are not needed, and yet, in the for loop it completely freezes the computer and everything stops. The web application is not even responding anymore.

The simplified controller:

    public SmContext db { get; set; }

    public SimulateRoundModel(SmContext db)
    {
        this.db = db;
    }

    public async Task<IActionResult> OnPost()
    {
        List<Match> matches = new CollectorClass(db).Collect();
        MyClass.Wrapper(matches);
        return Page();
    }

The simplified code:

public static void Wrapper(List<Match> matches)
{
    Parallel.For(0, matches.Count,
           index =>
           {
               matches[index].LongSim();
           });
}

Match class:


private SmContext db { get; set; }

public Match(db)
{
    this.db = db;
}

public void longSim()
{
    db.Dispose(); // disposing the main dbcontext that the constructor receives, we don't want to use that

    using (SmContext db = new SmContext())
    {
        // some initial query and insert
    }

    for (int i = 0; i < 100; i++)
    {
        Thread.Sleep(5000);

        // some simulation

        db = new SmContext();

        SomeInsert(); // these are using the db for the insert
        SomeInsert();
        SomeInsert();

        db.Dispose();
    }
}

We are talking about 5-50 matches and Parallel.For optimized them very well with the old SqlClient solutions, I have seen running it with 200 matches without an issue before. These are not intensive tasks, only simple stuff, and some queries, but they are running long. Ideally, I would like to continue saving the progress to the database without a major rewrite.

The ultimate question is, is there a conceptual issue here, that I am too newbie to recognize, or this solution should work fine and there is something fuzzy going on in the black spots of the code?

1

There are 1 answers

0
Guru Stron On

It would more in guess territory then something I can prove but from my experience multiple SomeInsert's with the same context look a bit suspicious. EF Core performs insert/update operation relying on tracking and even if you use AsNoTracking new entries still will be handled by change tracker, so if you are actually inserting a lot of data (and note that EF always was not very suitable for batch inserts) you will end up with the change tracker having a lot of entities which can slow down EF performance considerably. I would suggest one of the following options:

  • Call ChangeTracker.Clear after inserting some considerable amount of entities* (this also can be used instead of recreating the context outside the loop)
  • Recreate the context after inserting some considerable amount of entities*
  • Use another technology or extension library (EFCore.BulkExtensions for example) supporting bulk inserts

* - you will need to determine the optimal size of inserted data to recreate/clear tracker and call SaveChanges, like was done for old iteration of EF in this answer.

P.S.

Parallel.For
public void longSim()
Thread.Sleep(5000);

I would strongly advice to make longSim asynchronous by using await Task.Delay(5000) and switch to Parallel.ForEachAsync which supports async methods. This also will allow to use async versions of EF Core methods.

One more thing which can be worth taking into consideration is thread pool starvation which can sometimes have somewhat similar "side" effects but if the only change you made is the switch to EF Core instead SQLClient and it leads to the observed behaviour then thread pool starvation should not be the reason.