I need to compare 2 databases on their datas. But I have to compare only the data, not the keys, which may be different (but the relations between objets must be the same).
Actually it is a need to verify the proper functioning of a data conversion tool.
Simple example:
I use an entity-framework on C# .Net and Sqlite, here's the database schema:
public partial class Document
{
[PrimaryKey, AutoIncrement]
public Int64 idDocument { get; set; }
public string name { get; set; }
}
public partial class Diagram
{
[PrimaryKey, AutoIncrement]
public Int64 idDiagram { get; set; }
public string title { get; set; }
}
public partial class Document_contains_diagram
{
public Int64 idDocument { get; set; }
public Int64 idDiagram { get; set; }
}
The database :
Tables in order : Document - Diagram - Document_contains_diagram
| idDocument | name | idDiagram | title | idDocument | idDiagram | ||
|---|---|---|---|---|---|---|---|
| 1 | "myFirstDocument" | 1 | "evolution" | 1 | 1 | ||
| 2 | "mySecondDocument" | 2 | "futur" | 2 | 2 |
Should be considered as same as:
| idDocument | name | idDiagram | title | idDocument | idDiagram | ||
|---|---|---|---|---|---|---|---|
| 3 | "myFirstDocument" | 8 | "evolution" | 3 | 8 | ||
| 4 | "mySecondDocument" | 6 | "futur" | 4 | 6 |
It's an example, the true database I want to compare contains more than 50 tables and often more than 10000 entries (RAM limit is a contraint too).
So I'm looking for an generic algorythm, or a tool to compare theses 2 databases that might help me in my search. All tools I've seen only check for strict equality of the tables. Maybe converting the database into a graph would works ?
I don't know of an automated tool to do this, but I would recommend building a one-off solution for finding a canonical arrangement of the data. Then comparing canonical versions.
What I mean is this. You'd export all of the data, and import it into SQLite. Then for each table you define a canonical order. You then migrate IDs into canonical ones, then migrate all of the IDs. Repeat.
Much of the structure can be automatically generated. But I'd recommend automatically generating a configuration that you then fix for the weird edge cases that always exist.
In your case the configuration might look something like this:
What your tool would do with it is:
So in your example it would take the first database, and after a bunch of work find that it was in canonical order. It would take the second database, and for
idDocumentwould replace 3 with 1 everywhere, and 4 with 2. ForidDiagramit would replace 8 with 1 and 6 with 2. Once it did this, the two databases are now identical, so they would match.