arc4random_uniform 和 PCG 中的均匀分布

Uniform distribution in arc4random_uniform and PCG

来自 OpenBSDarc4random_uniformMelissa O'NeillPCG 库都具有类似的算法来生成不超过排除上限的无偏差无符号整数值.

inline uint64_t
pcg_setseq_64_rxs_m_xs_64_boundedrand_r(struct pcg_state_setseq_64 * rng,
                                        uint64_t bound) {
    uint64_t threshold = -bound % bound;
    for (;;) {
        uint64_t r = pcg_setseq_64_rxs_m_xs_64_random_r(rng);
        if (r >= threshold)
            return r % bound;
    }
}

-bound % bound 不总是零吗?如果它始终为零,那么为什么还要循环和 if 语句?

OpenBSD也有同样的事情。

uint32_t
arc4random_uniform(uint32_t upper_bound)
{
    uint32_t r, min;

    if (upper_bound < 2)
        return 0;

    /* 2**32 % x == (2**32 - x) % x */
    min = -upper_bound % upper_bound;

    /*
     * This could theoretically loop forever but each retry has
     * p > 0.5 (worst case, usually far better) of selecting a
     * number inside the range we need, so it should rarely need
     * to re-roll.
     */
    for (;;) {
        r = arc4random();
        if (r >= min)
            break;
    }

    return r % upper_bound;
}

Apple 版本 arc4random_uniform 有不同的版本。

u_int32_t
arc4random_uniform(u_int32_t upper_bound)
{
    u_int32_t r, min;

    if (upper_bound < 2)
        return (0);

#if (ULONG_MAX > 0xffffffffUL)
    min = 0x100000000UL % upper_bound;
#else
    /* Calculate (2**32 % upper_bound) avoiding 64-bit math */
    if (upper_bound > 0x80000000)
        min = 1 + ~upper_bound;     /* 2**32 - upper_bound */
    else {
        /* (2**32 - (x * 2)) % x == 2**32 % x when x <= 2**31 */
        min = ((0xffffffff - (upper_bound * 2)) + 1) % upper_bound;
    }
#endif

    /*
     * This could theoretically loop forever but each retry has
     * p > 0.5 (worst case, usually far better) of selecting a
     * number inside the range we need, so it should rarely need
     * to re-roll.
     */
    for (;;) {
        r = arc4random();
        if (r >= min)
            break;
    }

    return (r % upper_bound);
}

因为 bounduint64_t-bound 的计算模数为 264。结果是264bound,不是−bound.

然后-bound % bound计算264boundbound的余数。这等于 264bound.

的余数

通过将 threshold 设置为此并拒绝小于 threshold 的数字,例程将接受的间隔减少到 264threshold 数字。结果是一个区间,其中的数字数量是 bound.

的倍数

从在该时间间隔内选择的数字 r,例程 returns r % bound。由于间隔的修剪,每个残基的出现次数相等,因此结果对任何残基都没有偏差。