C# Entity Framework: Bulk Extensions Input Memory Issue

1.1k views Asked by At

I am currently using EF Extensions. One thing I don't understand, "its supposed to help with performance"

however placing a million+ records into List variable, is a Memory Issue itself. So If wanting to update million records, without holding everything in memory, how can this be done efficiently?

Should we use a for loop, and update in batches say 10,000? Does EFExtensions BulkUpdate have any native functionality to support this?

Example:

var productUpdate = _dbContext.Set<Product>()
    .Where(x => x.ProductType == 'Electronics');  // this creates IQueryable

await productUpdate.ForEachAsync(c => c.ProductBrand = 'ABC Company');

_dbContext.BulkUpdateAsync(productUpdate.ToList());

Resource:

https://entityframework-extensions.net/bulk-update

2

There are 2 answers

8
Pac0 On BEST ANSWER

I found the "proper" EF Extensions way to do a bulk update with a query-like condition:

var productUpdate = _dbContext.Set<Product>()
    .Where(x => x.ProductType == 'Electronics')
    .UpdateFromQuery( x => new Product { ProductBrand = "ABC Company" });

This should result in a proper SQL UPDATE ... SET ... WHERE, without the need to load entities first, as per the documentation:

Why UpdateFromQuery is faster than SaveChanges, BulkSaveChanges, and BulkUpdate?

UpdateFromQuery executes a statement directly in SQL such as UPDATE [TableName] SET [SetColumnsAndValues] WHERE [Key].

Other operations normally require one or multiple database round-trips which makes the performance slower.

You can check the working syntax on this dotnet fiddle example, adapted from their example of BulkUpdate.

Other considerations

  • No mention of batch operations for this, unfortunately.

  • Before doing a big update like this, it might be worth considering deactivating indexes you may have on this column, and rebuild them afterward. This is especially useful if you have many of them.

  • Careful about the condition in the Where, if it can't be translated as SQL by EF, then it will be done client side, meaning the "usual" terrible roundtrip "Load - change in memory - update"

6
Flater On

This is actually something that EF is not made for. EF's database interactions start from the record object, and flow from there. EF cannot generate a partial UPDATE (i.e. not overwriting everything) if the entity wasn't change tracked (and therefore loaded), and similarly it cannot DELETE records based on a condition instead of a key.

There is no EF equivalent (without loading all of those records) for conditional update/delete logic such as

UPDATE People
SET FirstName = 'Bob'
WHERE FirstName = 'Robert'

or

DELETE FROM People
WHERE FirstName = 'Robert'

Doing this using the EF approach will require you to load all of these entities just to send them back (with an update or delete) to the database, and that's a waste of bandwidth and performance as you've already found.

The best solution I've found here is to bypass EF's LINQ-friendly methods and instead execute the raw SQL yourself. This can still be done using an EF context.

using (var ctx = new MyContext())
{
    string updateCommand = "UPDATE People SET FirstName = 'Bob' WHERE FirstName = 'Robert'";
    int noOfRowsUpdated = ctx.Database.ExecuteSqlCommand(updateCommand);

    string deleteCommand = "DELETE FROM People WHERE FirstName = 'Robert'";
    int noOfRowsDeleted = ctx.Database.ExecuteSqlCommand(deleteCommand);
}

More information here. Of course don't forget to protect against SQL injection where relevant.

The specific syntax to run raw SQL may vary per version of EF/EF Core but as far as I'm aware all versions allow you to execute raw SQL.


I can't comment on the performance of EF Extensions or BulkUpdate specifically, and I'm not going to buy it from them.

Based on their documentation, they don't seem to have the methods with the right signatures to allow for conditional update/delete logic.

  • BulkUpdate doesn't seem to allow you to input the logical condition (the WHERE in your UPDATE command) that would allow you to optimize this.
  • BulkDelete still has a BatchSize setting, which suggests that they are still handling the records one at a time (well, per batch I guess), and not using a single DELETE query with a condition (WHERE clause).

Based on your intended code in the question, EF Extensions isn't really giving you what you need. It's more performant and cheaper to simply execute raw SQL on the database, as this bypasses EF's need to load its entities.

Update
I might stand corrected, there is some support for conditional update logic, as seen here. However, it is unclear to me while the example still loads everything in memory and what the purpose of that conditional WHERE logic then is if you've already loaded it all in memory (why not use in-memory LINQ then?)

However, even if this works without loading the entities, it's still:

  • more limited (only equality checks are allowed, compared to SQL allowing any boolean condition that is valid SQL),
  • relatively complex (I don't like their syntax, maybe that's subjective)
  • and more costly (still a paid library)

compared to rolling your own raw SQL query. I would still suggest rolling your own raw SQL here, but that's just my opinion.