I got hold on an SUPER-FAST algorithm that generates an array of random bytes, uniformly. It's 6 times faster than c++ uniform distribution and mersenne-twister of std library.
The count of an array is divisible by 4, so it can be interpreted as array of integers. Casting each entry to an integer, produces values in the range [INT_MIN, INT_MAX]. But how can I transform these integer values to lie between my own [min, maximum]?
I want to avoid any if-else, to avoid branching.
Maybe I should apply some bitwise logic, to discard irrelevant bits in each number? (because all remaining, unmasked bits will be either 0 or 1 anyway). If I can extract the most significant bit in my maximum-value, I could mask any bits that are more significant than that one, in my integers.
For example, if I want my max to be 17, then it is 00010001 in binary form. Maybe my mask would then look as 00011111? I could then apply it to all numbers in my array.
But, this mask is wrong ...It actually allows values up to (1+2+4+8+16) :(
What can I do? Also, how to take care of the min?
Edit
I am generating millions of numbers every frame of my application, for neural networks. I managed to vectorize the code using AXV2 for float variants (using this post), but need to get integers working too.
Since the range may not be a power of two, bitmasking is out, but you found that out already.
Modulo is also out, it does not exist as a native operation in AVX2 (and even if it did, that wouldn't necessarily make it efficient).
There is an other option: multiply-high, using
_mm256_mul_epu32(unfortunately there is no "pure" multiply-high for 32bit numbers, like there is for 16bit numbers, so we're stuck with an operation that only does 50% useful work). The idea there is to take the input numberx(full range) and the desired ranger, then computer * x / 2^32where the division is implicit (implemented by taking the high half of the product).x / 2^32would have been a number in [0.0 .. 1.0) (excluding 1.0) if it was interpreted as a rational number, multiplying byrthen stretches the range to be [0.0 ..r) (excludingr). That's not how it's calculated, but that's where the formula comes from.Setting the minimum of the range is handled easily by adding
minto the scaled result.In code (slightly tested):
It's still an exclusive range, it cannot handle the full
[INT_MIN .. INT_MAX]as output range. There is no way to even specify it, the most it can do is[INT_MIN .. INT_MAX)(or for example an equivalent range with zero offset:[0 .. -1)).It's also not really uniform, for the same reason that the simple modulo-based range reduction isn't really uniform, you just cannot fairly divide
Nmarbles overKbins unlessKhappens to divideNevenly.