WARNING The author of this routine has been writing random-number generators for many years and has never been known to write one that worked.
Unix Third Edition (V3) manual page
What?
Select a Uniformly Distributed random number from interval , and this interval may change dynamically.
We will assume that RNG will output a uniformly distributed random number in
To do it correctly in C++.
Biased Approach
Classic Modulo
- Biased because
rand()
produces numbers in the range[0..2^32)
, for numbers that can not perfectly divide2^32
, some of them have a lower change to be selected. - Eg: 52 does not perfectly divide
2^32
, and2^32 mod 52 = 48
, so49..52
have a lower change to be selected. - One other concern we might have is that some generators have weak low-order bits. That is, the lower bit does not pass the statistical tests (for every bit it must appear 50/50). When we perform
% 52
, we are essentially passing the lowest bit straight through the output.
Floating Point Multiply
- Note that
0x1.0p-32
is a binary floating point constant for - This approach is just as biased as the classic modulo approach, but the bias manifests itself differently. For example, if we were choosing numbers in the range
[0..52)
, the numbers 0, 13, 26 and 39 would appear once less often than the others. - That is, it does not uniformly map input to an even output.
Int Multiply
- It might appear that this version requires 64-bit arithmetic, but on x86 CPUs, a good compiler will compile this code to a 32-bit
mult
instruction (which yields two 32-bit outputs, one of which is the return value). We should expect this version to be fast, but biased in exactly the same way as the floating-point multiplication method. - Biased for the exact same reason as the floating-point multiplication method.
Unbiased
The most common method to achieve fairness is through rejection.
Division with Rejection
Instead of calculating x * range / 2**32
, we calculate x / (2**32 / range)
, then reject the uneven condition, essentially reject the last skewed biased (last 48 values).
For example, in the case of drawing a card using a 32-bit PRNG, we can generate a 32-bit number and divide it by 2^32/52 = 82,595,524
to choose our card. This method works when the random value from the 32-bit PRNG is less than 52 × 82595524 = 2^32/32 – 48
. Since division do not have remainder.
If the random value from the PRNG is one of the final 48 values at the top end of the range of the generator, it must be rejected and another sought.
- Note that range is unsigned. For an unsigned integer, the negation of a number n is .
- Dividing this by the range gives a value just less than , so we add 1 to it to get the correct divisor.
Debiased Modulo (Twice)
- Reject the first 48 value (that are biased)
The code for OpenBSD’s arc4random_uniform
(which is also used on OS X and iOS) adopts this strategy.
Debiased Modulo (Once)
Java adopts a different approach to generating a number in a range that manages to use only one modulo operation except in the relatively rare occasions when rejection occurs. The code is
It may take a little thought to figure out why this variant works. Unlike the previous version based on remainders, which removes bias by removing some of the lowest values from the underlying generation engine, this version first uses x - r
to make it multiple of range
, then filter out value that lands in 2**32 - range
as before to make it unbiased.
Debiased Integer Multiplication — Lemire’s Method
Much as we removed bias from the modulo technique, we can also remove bias from the integer multiplication method. This method was devised by Lemire.
The highlight is that it does not require any modulo arithmetic, which makes it use less CPU cycles and should be fast enough. This is also the algorithm used by Go’s rand
package.
Bitmask with Rejection (Unbiased) — Apple’s Method
Our final approach avoids division and remainder operations entirely. Instead, it uses a simple masking operation to get a random number in the range [0..2^k)
where k is the smallest value such that 2^k
is greater than the range. If the value is too large for our range, we discard and try another. Code below:
This approach was adopted by Apple when (in the macOS Sierra release) they made their own revision to the code for arc4random_uniform
.
Real World Cases
Go: Seeding Responsibility
Long story short, for modern implementation we should seed it automatically.
- Long story:
A bigger problem with
Seed
is that responsibility for seeding the global generator was unclear. Most users don’t useSource
andRand
directly. Instead, themath/rand
package provides a global generator accessed by top-level functions likeIntn
. Following the C standard library, the global generator defaults to behaving as ifSeed(1)
is called at startup. This is good for repeatability but bad for programs that want their random outputs to be different from one run to the next. The package documentation suggests usingrand.Seed(time.Now().UnixNano())
in that case, to make the generator’s output time-dependent, but what code should do this?Probably the main package should be in charge of how
math/rand
is seeded: it would be unfortunate for imported libraries to configure global state themselves, since their choices might conflict with other libraries or the main package. But what happens if a library needs some random data and wants to usemath/rand
? What if the main package doesn’t even knowmath/rand
is being used? We found that in practice many libraries add init functions that seed the global generator with the current time, “just to be sure”.Library packages seeding the global generator themselves causes a new problem. Suppose package main imports two packages that both use
math/rand
: package A assumes the global generator will be seeded by package main, but package B seeds it in aninit
func. And suppose that package main doesn’t seed the generator itself. Now package A’s correct operation depends on the coincidence that package B is also imported in the program. If package main stops importing package B, package A will stop getting random values. We observed this happening in practice in large codebases.In retrospect, it was clearly a mistake to follow the C standard library here: seeding the global generator automatically would remove the confusion about who seeds it, and users would stop being surprised by repeatable output when they didn’t want that.
Linear Congruential Generator
Linear Congruential Generator
- Benefits:
- Constants can be chosen such that they emit every possible output value once before repeating
- Problems:
- The high bits of the state do not affect the low bits at all
Algorithm
Links
Link to original
- Linear congruential generator - Wikipedia
- Random number generators: good ones are hard to find | Communications of the ACM
Random Shuffle: Knuth Shuffle
Knuth Shuffle
It shuffles an array of size so that possible permutations are equiprobable.
Require: array A made of n elements indexed from 0 to n − 1 1: for i = (n − 1)..1 do 2: j ← random integer in [0, i] 3: exchange A[i] and A[j] 4: end for
- We count
i
down because we need a new seed every time we callrand
Proof
- We start with an array
x = [1, 2. ..., N]
- There are possible permutation.
- We consider an arbitrary array
x' = [b1, b2, ..., bN]
, whereb1, ..., bN]
are distinct integers between 1 and n.
- Intuition: Every element gets shuffled
- What’s the Probability that x transferred to x’ ?
- QED
Optimize?
It’s a nicer? simpler? solution to the shuffling problem:
- Loop through each card in the deck.
- Swap the current card with another randomly chosen card.
The evil lies in
Link to originalrand.Next
, where it is alwaysrand.Next(3)
, giving us results. But it should only have results. Compared to the original method.
PRNG VS RNG
- Pseudo: the input space (seed) is limited. Thus, we can predict the result.
- CPRNG: crypto-strength version of
PRNG
. It takes a 256-bit seed rather than a 32-bit seed.
Links
- security - How are pseudorandom and truly random numbers different and why does it matter? - Super User
- permutation - Is java.util.Random really that random? How can I generate 52! (factorial) possible sequences? - Stack Overflow
- Efficiently Generating a Number in a Range | PCG, A Better Random Number Generator
- [1805.10941] Fast Random Integer Generation in an Interval
- Linear Congruential Generator
- Linear Feedback Shift Register
- Random Sampling
- ChaCha
- Fast inverse square root
- Mersenne Twister