I am trying to compare two CSV files that have the same data but columns in different orders. When the column orders match, the following code works: How can I tweak my following code to make it work when column orders don't match between the CSV files?
Set<String> source = new HashSet<>(org.apache.commons.io.FileUtils.readLines(new File(sourceFile)));
Set<String> target = new HashSet<>(org.apache.commons.io.FileUtils.readLines(new File(targetFile)));
return source.containsAll(target) && target.containsAll(source)
For example, the above test pass when the source file and target file are in this way:
source file:
a,b,c
1,2,3
4,5,6
target file:
a,b,c
1,2,3
4,5,6
However, the source file is same, but if the target file is in the following way, it doesn't work.
target file:
a,c,b
1,3,2
4,6,5
A
Setrelies on properly functioning.equalsmethod for comparison, whether detecting duplicates, or comparing it's elements to those in anotherCollection. When I saw this question, my first thought was to create a newclassfor Objects to put into yourSetObjects, replacing theStringObjects. But, at the time, it was easier and faster to produce the code in my previous answer.Here is another solution, which is closer to my first thought. To start, I created a
Pairclass, which overrides.hashCode ()and.equals (Object other).The
.equals (Object obj)and the.hashCode ()methods were auto-generated by the IDE. As you know,.hashCode()should always be overridden when.equalsis overridden. Also, someCollectionObjects, such asHashMapandHashSetrely on proper.hashCode()methods.After creating
class Pair<T,U>, I createdclass CompareCSV1. The idea here is to use aSet<Set<Pair<String, String>>>where you haveSet<String>in your code.A
Pair<String, String>pairs a value from a column with the header for the column in which it appears.A
Set<Pair<String, String>>represents one row.A
Set<Set<Pair<String, String>>>represents all the rows.This code has some things in common with the code in my first answer:
String []Objects, with calls toArrays.asListmethod as substitutes for your data sources.I hard coded
","as the String split expression inmain. But, the new methods allow the String split expression to be passed. It allows a separate String split expressions for the column header line and the data lines.