How to automatically merge selected data and re-cluster in Openrefine?

76 views Asked by At

I have a big data set that I use in a project on bibliometrics. I want to use fingerprint in openrefine to merge similar yet non identical titles. When I am in openrefine, I can only manually "Merge selected and re-cluster" for 5000 total choices. Given my data set, this method is tedious at best.

I there a way to automatise the process as long as there are clusters to be found?

I tried looking up for information online, but not much to be found given my limited knowledge.

Thanks,

1

There are 1 answers

1
b2m On

OpenRefine is offering you a "Human-in-the-loop" approach for clustering because clustering methods are not foolproof, meaning the methods may produce false positives.

In the dialog window (see the OpenRefine Documentation on Clustering for an example) there is also a button labeled "Select all" to automatically select all found clusters.

In my experience fingerprinting quickly converges to a state where no more clusters can be found. So for this one project I would expect it to be faster to use the clustering dialog with the "Select all" button instead of finding your way around the API to automate the process.