I am trying to implement a range mapper for TRNG output files for a C application with ranges of up to 4 bits in size. Due to the pigeonhole bias problem I have settled on using a discard algorithm.
My idea for a parsimonious algorithm would be something like:
-- Read 16 bytes from file and store as an indexed 128 bit unsigned integer bitbucket to be bitmask selected n bits at a time.
-- Predetermine as much as possible the ranges/buckets required for each input and store in an array.
-- For each n bits in the bitbucket select an input from the array that will not discard it if one exists. If 2 bits cannot find an input try 3 bits and if that cannot find an input try with 4 bits. At first when there are many inputs it should be easy not to discard, but as the choice of inputs gets low discards will become more common. I am not entirely sure if it is better to start with fewer bits and work my way up or to do the opposite.
The downside of this bit sipping range mapper seems to be that I need to assume about twice as much random input data as would be required with biased scaling methods. For instance a 9 bucket input from a 4 bit rand output will miss about 43% of the time.
Existing implementations/algorithms: This seems like an example of a more complex and efficient method of parsimonious range mapping but I find his explanation entirely impenetrable. Can anyone explain it to me in English or suggest a book I might read or a university class I might take that would give me a background to understand it?
There is also arc4random which seems to be a runtime optimized unbiased modulo discard implementation. Like almost all unbiased range mapper implementations I have found this seems not to particularly care about how much data it uses. That does not however mean that it is necessarily less data efficient because it has the advantage of fewer misses.
The basic idea of arc4random seems to be that as long as the number of pigeons (max_randvalue_output) is evenly divisible by the number of holes (rangeupperbound) the modulo function itself is an elegant and unbiased range mapper. However modulo only seems to be relevant when you are not bit sipping, i.e. when the output from the random source is more than ceil(log2(buckets)) bits.
There seems to be a tradeoff between the number of 'wasted' random bits and the percentage of discards. The percentage of misses is inversely proportional to the number of excess bits in the input to the range mapper. It seems like there should be a mathematical way to compare the data efficiency of a bit sipping range mapper with a more bit hungry version with fewer misses, but I don't know it.
So my plan is to just write two implementations: a bit sipping parsimonious type of range mapper that may or may not be a little like the mathforum example (which I don't understand) and an invariant byte input modulo range mapper which accepts byte inputs from a TRNG and uses a discard-from-the-top-of-largest-multiple modulo method of debiasing to match (x)n pigeons to n holes which is intended to be like arc4random. When finished I plan to post them on codereview.
I am basically looking for help or advice with any of these issues that might help me to write a more parsimonious but still unbiased range mapper particularly with respect to my parsimonious algorithm. Runtime efficiency is not a priority.
I looked at the "Fast Dice Roller" (FDR) pointed to by @Peter.O, which is indeed simple (and avoids dividing). But each time a random number is generated, this will eat some number of bits and discard the fraction of those bits it does not use.
The "batching"/"pooling" techniques seem to do better than FDR, because the unused fractions of bits are (at least partly) retained.
But interestingly, the DrMath thing you referenced is basically the same as the FDR, but does not start from scratch for each random value it returns.
So the FDR to return 0..n-1 goes:
The DrMath thing goes:
which, as noted, is basically the same as FDR, but keeps track of the unused randomness.
When testing I found:
Where the
costisbits-used / (100000 * log2(3))noting that log2(3) = (1.58496). (So thecostis the number of bits used divided by the number of bits one would hope to use).Also:
And:
where constructed 100000 values, and for each one chose a range in
5..60(inclusive).It seems to me that DrMath has it ! Though for larger ranges it has less of an advantage.
Mind you... DrMath uses at least 2 divisions per random value returned, which gives me conniptions run-time-wise. But you did say you weren't interested in run-time efficiency.
How does it work ?
So, we want a sequence of random values
rto be uniformly distributed in a range0..n-1. Inconveniently, we only have a source of randomness which gives us random values which are uniformly distributed in0..m-1. Typicallymwill be a power of 2 -- and let us assume thatn < m(ifn == mthe problem is trivial, ifn > mthe problem is impossible). For anyrwe can taker MOD nto give a random value in the required range. If we only userwhenr < nthen (trivially) we have the uniform distribution we want. If we only userwhenr < (n * q)and(n * q) < mwe also have a uniform distribution. We are here "rejecting"rwhich are "too big". The fewerrwe reject, the better. So we should chooseqsuch that(n * q) <= m < (n * (q-1))-- son * qis the largest multiple ofnless than or equal tom. This, in turn, tells us thatn"much less" thanmis to be preferred.When we "reject" a given
rwe can throw it all away, but that turns out not to be completely necessary. Also,mdoes not have to be a power of 2. But we will get to that later.Here is some working Python:
Must have
N>= maximum range, preferably much bigger.2**31or2**63are obvious choices.On the first call of
random_drmath()step (2) will read random bits to "fill the pool". ForN = 2**63, will end up withm = 2**63andrwith 63 random bits. Clearlyris random and uniformly distributed in0..m-1. [So far, so good.]Now (and on all further calls of
random_drmath()) we hope to extract a random value uniformly in0..n-1fromr, as discussed above. So -- step (3) -- constructsnqwhich is the largest multiple ofnwhich is less than or equal tom. Ifr >= nqwe cannot use it, because there are fewer thannvalues innq..m-1-- this is the usual "reject" criterion.So, where
r < nqcan return a value -- step (4). The trick here is to think ofmandras numbers "base-n". The ls "digit" ofris extracted (r % n) and returned. Thenmandrare shifted right by one "digit" (q = m // nandr // n), and stored in the "pool". I think that it is clear that at this pointrandmare stillr < mandrrandom and uniformly distributed in0..m-1. Butmis no longer a power of 2 -- but that's OK.But, if
r >= nqmust reducerandmtogether -- step (5) -- and try again. Trivially, could setm = 1 ; r = 0and start again. But what we do is subtractnqfrom bothmandrThat leavesruniformly distributed in0..m-1. This last step feels like magic, but we know thatris innq..m-1and each possible value has equal probability, sor-nqis in the range0..m-nq-1and each possible value still has equal probability ! [Remember that the 'invariant' at the top of thewhileloop is thatris random and uniformly distributed in0..m-1.]For small
nthe rejection step will discard most ofr, but for smalln(compared toN) we hope not to reject very often. Conversely, for largen(compared toN) we may expect to reject more often, but this retains at least some of the random bits we have eaten so far. I feel there might be a way to retain more ofr... but a haven't thought of a simple way to do that... and if the cost of reading one random bit is high, it might be worth trying to find a not-simple way !FWIW: setting
N = 128I get:so as
napproachesNthe cost per value goes up.