I am looking to sort a ListBuffer(Int, Int, Int) by the third value most efficiently. This is what I currently have. I am using sortBy. Note y and z are both ListBuffer[(Int, Int, Int)], which I am taking the difference first. My goal is to optimize this and do this operation (taking the difference between the two lists and sorting by the third element) most efficiently. I am assuming the diff part cannot be optimized but the sortBy can, so I am looking for a efficient way to do to the sorting part. I found posts on sorting Arrays but I am working with ListBuffer and converting to an Array adds overhead, so I rather not to convert my ListBuffer.
val x = (y diff z).sortBy(i => i._3)
1) If you want to use Scala libraries then you can't do much better than that. Scala already tries to sort your collection in the most efficient way possible.
SeqLikedefinesdef sortBy[B](f: A => B)(implicit ord: Ordering[B]): Repr = sorted(ord on f)which calls this implementation:This is what your code will be calling. As you can see it already uses arrays to sort data in place. According to the javadoc of
public static void sort(Object[] a):2) If you try to optimize by inserting results of your diff directly into a sorted structure like a binary tree as you produce them element by element, you'll still be paying the same price: average cost of insertion is
log(n)timesnelements= n log(n)- same as any fast sorting algorithm like merge sort.3) Thus you can't optimize this case generically unless you optimize to your particular use-case.
3a) For instance,
ListBuffermight be replaced with aSetanddiffshould be much faster. In fact it's implemented as:which uses
-which in turn should be faster thandiffonSeqwhich has to build a map first:3b) You can also avoid sorting by using
_3as an index in an array. If you insert using that index your array will be sorted. This will only work if your data is dense enough or you are happy to deal with sparse array afterwards. One index might also have multiple values mapping to it, you'll have to deal with it as well. Effectively you are building a sorted map. You can use aMapfor that as well, but aHashMapwon't be sorted and aTreeMapwill requirelog(n)for add operation again.Consult Scala Collections Performance Characteristics to understand what you can gain based on your case.
4) Anyhow, sort is really fast on modern computers. Do some benchmarking to make sure you are not prematurely optimizing it.
To summarize complexity for different scenarios...
Your current case:
SeqLike:nto create a map fromthat+nto iterate overthis(map lookup is effectively constant time (C)) =2norO(n)O(n log(n))O(n) + O(n log(n))=O(n log(n)), more precisely:2n + nlog(n)If you use
Setinstead ofSeqLike:Set:nto iterate (lookup is C) =O(n)O(n) + O(n log(n))=O(n log(n)), more precisely:n + nlog(n)If you use
Setand array to insert:O(n) + O(0)=O(n), more precisely:n. Might not be very practical for sparse data.Looks like in the grand scheme of things it does not matter that much unless you have a unique case that benefits from last option (array).
If you would have a
ListBuffer[Int]rather thanListBuffer[(Int, Int, Int)]I would suggest to sort both collections first and then do a diff by doing a single pass through both of them at the same time. This would beO(nlog(n)). In your case a sort by_3is not sufficient to guarantee exact order in both collections. You can sort by all three fields of a tuple but that will change the original ordering. If you are fine with that and writing your own diff then it might be the fastest option.