WARNING The author of this routine has been writing random-number generators for many years and has never been known to write one that worked.
Unix Third Edition (V3) manual page
What?
Select a Uniformly Distributed random number from interval [0,s), and this interval may change dynamically.
We will assume that RNG will output a uniformly distributed random number in [0,232−1]
Biased because rand() produces numbers in the range [0..2^32), for numbers that can not perfectly divide 2^32, some of them have a lower change to be selected.
Eg: 52 does not perfectly divide 2^32, and 2^32 mod 52 = 48, so 49..52 have a lower change to be selected.
One other concern we might have is that some generators have weak low-order bits. That is, the lower bit does not pass the statistical tests (for every bit it must appear 50/50). When we perform % 52, we are essentially passing the lowest bit straight through the output.
Note that 0x1.0p-32 is a binary floating point constant for 2−32
This approach is just as biased as the classic modulo approach, but the bias manifests itself differently. For example, if we were choosing numbers in the range [0..52), the numbers 0, 13, 26 and 39 would appear once less often than the others.
That is, it does not uniformly map input to an even output.
Int Multiply
uint32_t bounded_rand(rng_t& rng, uint32_t range) { uint32_t x = rng(); uint64_t m = uint64_t(x) * uint64_t(range); return m >> 32;}
It might appear that this version requires 64-bit arithmetic, but on x86 CPUs, a good compiler will compile this code to a 32-bit mult instruction (which yields two 32-bit outputs, one of which is the return value). We should expect this version to be fast, but biased in exactly the same way as the floating-point multiplication method.
Biased for the exact same reason as the floating-point multiplication method.
Unbiased
The most common method to achieve fairness is through rejection.
Division with Rejection
Instead of calculating x * range / 2**32, we calculate x / (2**32 / range), then reject the uneven condition, essentially reject the last skewed biased (last 48 values).
For example, in the case of drawing a card using a 32-bit PRNG, we can generate a 32-bit number and divide it by 2^32/52 = 82,595,524 to choose our card. This method works when the random value from the 32-bit PRNG is less than 52 × 82595524 = 2^32/32 – 48. Since division do not have remainder.
If the random value from the PRNG is one of the final 48 values at the top end of the range of the generator, it must be rejected and another sought.
uint32_t bounded_rand(rng_t& rng, uint32_t range) { // calculates divisor = 2**32 / range uint32_t divisor = ((-range) / range) + 1; if (divisor == 0) // overflow, it's really 2**32 return 0; for (;;) { uint32_t val = rng() / divisor; if (val < range) return val; }}
Note that range is unsigned. For an unsigned integer, the negation of a number n is 232−n.
Dividing this by the range gives a value just less than 232/range, so we add 1 to it to get the correct divisor.
Debiased Modulo (Twice)
Reject the first 48 value (that are biased)
uint32_t bounded_rand(rng_t& rng, uint32_t range) { // calculates 2**32 % range uint32_t t = (-range) % range; for (;;) { uint32_t r = rng(); if (r >= t) return r % range; }}
Java adopts a different approach to generating a number in a range that manages to use only one modulo operation except in the relatively rare occasions when rejection occurs. The code is
static uint32_t bounded_rand(rng_t& rng, uint32_t range) { uint32_t x, r; do { x = rng(); r = x % range; } while (x - r > (-range)); return r;}
It may take a little thought to figure out why this variant works. Unlike the previous version based on remainders, which removes bias by removing some of the lowest values from the underlying generation engine, this version first uses x - r to make it multiple of range, then filter out value that lands in 2**32 - range as before to make it unbiased.
Debiased Integer Multiplication — Lemire’s Method
Much as we removed bias from the modulo technique, we can also remove bias from the integer multiplication method. This method was devised by Lemire.
The highlight is that it does not require any modulo arithmetic, which makes it use less CPU cycles and should be fast enough. This is also the algorithm used by Go’s rand package.
uint32_t bounded_rand(rng_t& rng, uint32_t range) { uint32_t t = (-range) % range; do { uint32_t x = rng(); uint64_t m = uint64_t(x) * uint64_t(range); uint32_t l = uint32_t(m); } while (l < t); return m >> 32;}
Bitmask with Rejection (Unbiased) — Apple’s Method
Our final approach avoids division and remainder operations entirely. Instead, it uses a simple masking operation to get a random number in the range [0..2^k) where k is the smallest value such that 2^k is greater than the range. If the value is too large for our range, we discard and try another. Code below:
uint32_t bounded_rand(rng_t& rng, uint32_t range) { uint32_t mask = ~uint32_t(0); // all bits to 1 --range; // Decrement the range by 1 because the range [0..range-1] needs to be inclusive. // clz: count leading zero // range|1 so we have at least 1 bit mask // right shift effectively creating a mask that matches the bit length of range. mask >>= __builtin_clz(range|1); uint32_t x; do { // since rng() return bits uniformly, x should also have uniformly distributed bits x = rng() & mask; } while (x > range); return x;}
This approach was adopted by Apple when (in the macOS Sierra release) they made their own revision to the code for arc4random_uniform.
Real World Cases
Go: Seeding Responsibility
Long story short, for modern implementation we should seed it automatically.
Long story:
A bigger problem with Seed is that responsibility for seeding the global generator was unclear. Most users don’t use Source and Rand directly. Instead, the math/rand package provides a global generator accessed by top-level functions like Intn. Following the C standard library, the global generator defaults to behaving as if Seed(1) is called at startup. This is good for repeatability but bad for programs that want their random outputs to be different from one run to the next. The package documentation suggests using rand.Seed(time.Now().UnixNano()) in that case, to make the generator’s output time-dependent, but what code should do this?
Probably the main package should be in charge of how math/rand is seeded: it would be unfortunate for imported libraries to configure global state themselves, since their choices might conflict with other libraries or the main package. But what happens if a library needs some random data and wants to use math/rand? What if the main package doesn’t even know math/rand is being used? We found that in practice many libraries add init functions that seed the global generator with the current time, “just to be sure”.
Library packages seeding the global generator themselves causes a new problem. Suppose package main imports two packages that both use math/rand: package A assumes the global generator will be seeded by package main, but package B seeds it in an init func. And suppose that package main doesn’t seed the generator itself. Now package A’s correct operation depends on the coincidence that package B is also imported in the program. If package main stops importing package B, package A will stop getting random values. We observed this happening in practice in large codebases.
In retrospect, it was clearly a mistake to follow the C standard library here: seeding the global generator automatically would remove the confusion about who seeds it, and users would stop being surprised by repeatable output when they didn’t want that.
Linear Congruential Generator
Linear Congruential Generator
Benefits:
Constants can be chosen such that they emit every possible output value once before repeating
Problems:
The high bits of the state do not affect the low bits at all
It shuffles an array of size n so that n! possible permutations are equiprobable.
Require: array A made of n elements indexed from 0 to n − 1
1: for i = (n − 1)..1 do
2: j ← random integer in [0, i]
3: exchange A[i] and A[j]
4: end for
We count i down because we need a new seed every time we call rand
Proof
We start with an array x = [1, 2. ..., N]
There are N! possible permutation.
We consider an arbitrary array x' = [b1, b2, ..., bN], where b1, ..., bN] are distinct integers between 1 and n.
Intuition: Every element gets shuffled
What’s the Probability that x transferred to x’ ? N!1
It’s a nicer? simpler? solution to the shuffling problem:
Loop through each card in the deck.
Swap the current card with another randomly chosen card.
// CAVET: BIASEDfor (int i = 0; i < cards.Length; i++){ int n = rand.Next(cards.Length); Swap(ref cards[i], ref cards[n]);}
The evil lies in rand.Next, where it is always rand.Next(3), giving us 33=27 results. But it should only have 3! results.
Compared to the original method.
for (int i = cards.Length - 1; i > 0; i--){ int n = rand.Next(i + 1); Swap(ref cards[i], ref cards[n]);}