arc4random_uniform 和 PCG 中的均匀分布
Uniform distribution in arc4random_uniform and PCG
来自 OpenBSD
的 arc4random_uniform
和 Melissa O'Neill
的 PCG
库都具有类似的算法来生成不超过排除上限的无偏差无符号整数值.
inline uint64_t
pcg_setseq_64_rxs_m_xs_64_boundedrand_r(struct pcg_state_setseq_64 * rng,
uint64_t bound) {
uint64_t threshold = -bound % bound;
for (;;) {
uint64_t r = pcg_setseq_64_rxs_m_xs_64_random_r(rng);
if (r >= threshold)
return r % bound;
}
}
-bound % bound
不总是零吗?如果它始终为零,那么为什么还要循环和 if 语句?
OpenBSD
也有同样的事情。
uint32_t
arc4random_uniform(uint32_t upper_bound)
{
uint32_t r, min;
if (upper_bound < 2)
return 0;
/* 2**32 % x == (2**32 - x) % x */
min = -upper_bound % upper_bound;
/*
* This could theoretically loop forever but each retry has
* p > 0.5 (worst case, usually far better) of selecting a
* number inside the range we need, so it should rarely need
* to re-roll.
*/
for (;;) {
r = arc4random();
if (r >= min)
break;
}
return r % upper_bound;
}
Apple 版本 arc4random_uniform
有不同的版本。
u_int32_t
arc4random_uniform(u_int32_t upper_bound)
{
u_int32_t r, min;
if (upper_bound < 2)
return (0);
#if (ULONG_MAX > 0xffffffffUL)
min = 0x100000000UL % upper_bound;
#else
/* Calculate (2**32 % upper_bound) avoiding 64-bit math */
if (upper_bound > 0x80000000)
min = 1 + ~upper_bound; /* 2**32 - upper_bound */
else {
/* (2**32 - (x * 2)) % x == 2**32 % x when x <= 2**31 */
min = ((0xffffffff - (upper_bound * 2)) + 1) % upper_bound;
}
#endif
/*
* This could theoretically loop forever but each retry has
* p > 0.5 (worst case, usually far better) of selecting a
* number inside the range we need, so it should rarely need
* to re-roll.
*/
for (;;) {
r = arc4random();
if (r >= min)
break;
}
return (r % upper_bound);
}
因为 bound
是 uint64_t
,-bound
的计算模数为 264。结果是264−bound
,不是−bound
.
然后-bound % bound
计算264−bound
模bound
的余数。这等于 264 模 bound
.
的余数
通过将 threshold
设置为此并拒绝小于 threshold
的数字,例程将接受的间隔减少到 264−threshold
数字。结果是一个区间,其中的数字数量是 bound
.
的倍数
从在该时间间隔内选择的数字 r
,例程 returns r % bound
。由于间隔的修剪,每个残基的出现次数相等,因此结果对任何残基都没有偏差。
来自 OpenBSD
的 arc4random_uniform
和 Melissa O'Neill
的 PCG
库都具有类似的算法来生成不超过排除上限的无偏差无符号整数值.
inline uint64_t
pcg_setseq_64_rxs_m_xs_64_boundedrand_r(struct pcg_state_setseq_64 * rng,
uint64_t bound) {
uint64_t threshold = -bound % bound;
for (;;) {
uint64_t r = pcg_setseq_64_rxs_m_xs_64_random_r(rng);
if (r >= threshold)
return r % bound;
}
}
-bound % bound
不总是零吗?如果它始终为零,那么为什么还要循环和 if 语句?
OpenBSD
也有同样的事情。
uint32_t
arc4random_uniform(uint32_t upper_bound)
{
uint32_t r, min;
if (upper_bound < 2)
return 0;
/* 2**32 % x == (2**32 - x) % x */
min = -upper_bound % upper_bound;
/*
* This could theoretically loop forever but each retry has
* p > 0.5 (worst case, usually far better) of selecting a
* number inside the range we need, so it should rarely need
* to re-roll.
*/
for (;;) {
r = arc4random();
if (r >= min)
break;
}
return r % upper_bound;
}
Apple 版本 arc4random_uniform
有不同的版本。
u_int32_t
arc4random_uniform(u_int32_t upper_bound)
{
u_int32_t r, min;
if (upper_bound < 2)
return (0);
#if (ULONG_MAX > 0xffffffffUL)
min = 0x100000000UL % upper_bound;
#else
/* Calculate (2**32 % upper_bound) avoiding 64-bit math */
if (upper_bound > 0x80000000)
min = 1 + ~upper_bound; /* 2**32 - upper_bound */
else {
/* (2**32 - (x * 2)) % x == 2**32 % x when x <= 2**31 */
min = ((0xffffffff - (upper_bound * 2)) + 1) % upper_bound;
}
#endif
/*
* This could theoretically loop forever but each retry has
* p > 0.5 (worst case, usually far better) of selecting a
* number inside the range we need, so it should rarely need
* to re-roll.
*/
for (;;) {
r = arc4random();
if (r >= min)
break;
}
return (r % upper_bound);
}
因为 bound
是 uint64_t
,-bound
的计算模数为 264。结果是264−bound
,不是−bound
.
然后-bound % bound
计算264−bound
模bound
的余数。这等于 264 模 bound
.
通过将 threshold
设置为此并拒绝小于 threshold
的数字,例程将接受的间隔减少到 264−threshold
数字。结果是一个区间,其中的数字数量是 bound
.
从在该时间间隔内选择的数字 r
,例程 returns r % bound
。由于间隔的修剪,每个残基的出现次数相等,因此结果对任何残基都没有偏差。