Problem trying to clone a Project Server database using OData and Entity Framework

156 views Asked by At

I am having trouble updating my entities with Parallel.Foreach. The program I have, works fine by using foreach to update the entities, but if I use Parallel.Foreach it gives me the error like : "Argument Exception: An item with the same key has already been added". I have no idea why it happens, shouldn't it be thread safe? Or why giving me this error? How to resolve this issue?

The program itself get some data from a database and copy it to another one. If the datarow exists with the same guid (see below), and the status unchanged the matching datarow in the second must be updated. If theres a match, and status changed, modifications must be ignored. Finally if no match in the second database, then insert the datarow into the second database. (Synchronize the two databases). I just want to speed up the process somehow, that is why I first think of parallel processing.

(I am using Autofac as an IoC container and dependency injection if that matters)

Here is the code snippet which tries to update:

     /* @param reports: data from the first database */
   public string SynchronizeData(List<Reports> reports, int statusid)
    {
        // reportdataindatabase - the second database data, List() actually selects all, see next code snippet

        List<Reports> reportdataindatabase = unitOfWorkTAFeedBack.ReportsRepository.List().ToList();

        int allcount = reports.Count;
        int insertedcount = 0;
        int updatedcount = 0;
        int ignoredcount = 0;

     // DOES NOT WORK, GIVES THE ERROR
        Parallel.ForEach(reports, r =>
        {
            var guid = reportdataindatabase.FirstOrDefault(x => x.AssignmentGUID == r.AssignmentGUID);

            if (guid == null)
            {
                unitOfWorkTAFeedBack.ReportsRepository.Add(r); // an insert on the repository
                insertedcount++;
            }
            else
            {
               if (guid.StatusId == statusid)
                {
                    r.ReportsID = guid.ReportsID;
                    unitOfWorkTAFeedBack.ReportsRepository.Update(r); // update on the repo
                    updatedcount++;
               }
                else
               {
                    ignoredcount++;
                }

            }
        });




 /* WORKS PERFECTLY BUT RELATIVELY SLOW - takes 80 seconds to update 1287 records
        foreach (Reports r in reports)
        {
            var guid = reportdataindatabase.FirstOrDefault(x => x.AssignmentGUID == r.AssignmentGUID); // find match between the two databases

            if (guid == null)
            {
                unitOfWorkTAFeedBack.ReportsRepository.Add(r); // no match, insert
                insertedcount++;
            }
            else
            {
                if (guid.StatusId == statusid)
                {
                    r.ReportsID = guid.ReportsID;
                    unitOfWorkTAFeedBack.ReportsRepository.Update(r); 
                    updatedcount++;
                }
                else
                {
                    ignoredcount++;
                }

            }

        } */

        unitOfWorkTAFeedBack.Commit(); // this only calls SaveChanges() on DbContext object

        int allprocessed = insertedcount + updatedcount + ignoredcount;

        string result = "Synchronization finished.  " + allprocessed + " reports processed out of " + allcount + ", " 
            + insertedcount + " has been inserted, " + updatedcount + " has been updated and " 
            + ignoredcount + " has been ignored. \n Press a button to dismiss this window."  ;

        return result;

    }

The program breaks on this Repository class in the Update method (with Parallel.Foreach, no problem with the standard foreach):

 public class EntityFrameworkReportsRepository : IReportsRepository
{

    private readonly TAFeedBackContext tAFeedBackContext;

    public EntityFrameworkReportsRepository(TAFeedBackContext tAFeedBackContext)
    {
        this.tAFeedBackContext = tAFeedBackContext;
    }

    public void Add(Reports r)
    {
        tAFeedBackContext.Reports.Add(r);
    }

    public void Delete(int Id)
    {
        var obj = tAFeedBackContext.Reports.Find(Id);
        tAFeedBackContext.Reports.Remove(obj);
    }

    public Reports Get(int Id)
    {
        var obj = tAFeedBackContext.Reports.Find(Id);
        return obj;
    }

    public IQueryable<Reports> List()
    {
        return tAFeedBackContext.Reports.AsNoTracking();
    }

    public void Update(Reports r)
    {
        var entry = tAFeedBackContext.Entry(r); // The Program Breaks At This Point!
        if (entry.State == EntityState.Detached)
        {
            tAFeedBackContext.Reports.Attach(r);
            tAFeedBackContext.Entry(r).State = EntityState.Modified;
        }
        else
        {
            tAFeedBackContext.Entry(r).CurrentValues.SetValues(r);
        }
    }


}
1

There are 1 answers

6
Seabizkit On

Please bear in mind it hard to give a complete answer as there are thing I need clarity on … but comments should help with building a picture.

Parallel.ForEach(reports, r => //Parallel.ForEach is not the answer..
{
    //reportdataindatabase is done..before so ok here
    // do you really want FirstOrDefault vs SingleOrDefault
    var guid = reportdataindatabase.FirstOrDefault(x => x.AssignmentGUID == r.AssignmentGUID);

    if (guid == null)
    {
        // this is done on the context not the DB, unresolved..(excuted)
        unitOfWorkTAFeedBack.ReportsRepository.Add(r); // an insert on the repository
        //insertedcount++; u would need a lock
    }
    else
    {
        if (guid.StatusId == statusid)
        {
            r.ReportsID = guid.ReportsID;
            // this is done on the context not the DB, unresolved..(excuted)
            unitOfWorkTAFeedBack.ReportsRepository.Update(r); // update on the repo
            //updatedcount++; u would need a lock
        }
        else
        {
            //ignoredcount++; u would need a lock
        }
    }
});

the issue here... as reportdataindatabase can contain the same key twice.. and the context is only updated after the fact aka when it get here..

unitOfWorkTAFeedBack.Commit();

it may have been called twice for the same entity as above (commit) is where the work is... doing the add/update above in Parallel wont save you any real time, as that part is quick..

//takes 80 seconds to update 1287 records... does seem long... //List reportdataindatabase = unitOfWorkTAFeedBack.ReportsRepository.List().ToList();

//PS Add how reports are retrieved.. you want something like

TAFeedBackContext db = new TAFeedBackContext();
var remoteReports = DatafromAnotherPLace //include how this was retrieved;
var localReports = TAFeedBackContext.Reports.ToList(); //these are tracked.. (by default)
foreach (var item in remoteReports)
{
    //i assume more than one is invalid.
    var localEntity = localReports.SingleOrDefault(x => x.AssignmentGUID == item.AssignmentGUID); 
    if (localEntity == null)
    {
        //add as it doenst exist 
        TAFeedBackContext.Reports.Add(new Report() { *set fields* });       
    }
    else
    {
        if (localEntity.StatusId == statusid) //only update if status is the passed in status.
        {
            //why are you modifying the remote entity
            item.ReportsID = localEntity.ReportsID;
            
            //update remove entity?, i get the impression its from a different context,
            //if not then cool, but you need to show how reports is retrieved
            
        }
        else
        {
            
        }

    }

} 

TAFeedBackContext.SaveChanges();