I am trying to construct as quick as possible a set of elements based on the result of an image classification.
In the details, I would like to store in this set all the (r, g, b) pixels which belong to a certain class. The problem has 2 classes, I would like to retain the pixels which are from class 1 and discard the pixels from class 0.
The classification is done using a trained mlpack classifier on a (r, g, b) vector of double.
I have to use a boost::unordered_set<uint32_t> for this task or similar.
The code up to now looks like this
boost::unordered_set<uint32_t> bset;
for (int r = 0; r < 256; r++)
{
for (int g = 0; g < 256; g++)
{
for (int b = 0; b < 256; b++)
{
arma::rowvec xp = { (double)b, (double)g, (double)r };
if ((bool)(clf.Classify(xp))) {
uint32_t cachePos = r + (g << 8) + (b << 16);
bset.insert(cachePos);
}
}
}
}
I have made some benchmarks and the slowest part is the insertion with insert(). To scan all the possible (r, g, b) it takes about 5 seconds. Since the code is called from a GUI I would like it to be faster to reduce the time a user has to wait for the result.
First I tried to change .insert() with .emplace() but as I expected there is little improvement.
I also tried filling another container, actually std::vector was quite fast, and the copying its content in the set using iterators:
std::vector<int> predictions;
for (int r = 0; r < 256; r++)
{
for (int g = 0; g < 256; g++)
{
for (int b = 0; b < 256; b++)
{
arma::rowvec xp = { (double)b, (double)g, (double)r };
if ((bool)(clf.Classify(xp))) {
uint32_t cachePos = r + (g << 8) + (b << 16);
predictions.push_back(cachePos);
}
}
}
}
bset = boost::unordered_set<uint32_t>(predictions.begin(), predictions.end());
But still, the last line takes a lot of time, around 2-3 seconds. Do you have any hint for me?
What can I do to improve the speed of my code? Are there faster container that I can use to replace boost::unordered_set ? The container should contain elements from class 1 only.
Since unordered_set can not predict number of elements to be inserted, it performs allocation and insertion for each particular element. In your case it's exactly 256 x 256 x 256 = 16,777,216 times. The bottleneck in this situation is memory allocation. Since vector consumes memory by chunks - it inserts much faster. The solution is to use the 'reserve' method: