这种实现真正随机性的尝试有效吗?

Is this attempt to implement real randomness valid?

伪随机性变成真正的随机性,因为生成的值序列中缺少实际模式;所以本质上,重复自身的随机元素序列可能是无限的。

我知道 random.py seed()s 的设计目的是尽可能远离 'pseudo' 字符(即使用当前时间戳、机器参数等)这对大多数情况都很好,但是如果需要在数学上确保零可预测性?

怎么办?

我读到,当我们 seed() 基于特定的物理事件,例如 放射性衰变 时,可以实现真正的随机性,但是如果,例如,我使用从录制的音频流派生的数组?

以下是我如何为此目的覆盖默认 random.seed() 行为的示例。我正在使用 sounddevice 库实现绑定到负责管理 I/O 声音设备的服务。

# original random imports here
# ...

from sounddevice import rec

__all__ = ["module level functions here"]

# original random constants here
# ...

# sounddevice related constants
# ----------------------------------------------------------------------
# FS: Sampling Frequency in Hz (samples per second);
# DURATION: Duration of the recorded audio stream (seconds);
# *Note: changing the duration will result in a slower generator, since
# the seed method must wait for the entire stream to be recorded
# before processing further.
# CHANNELS: N° of audio channels used by the recording function (_rec);
# DTYPE: Data type of the np.ndarray returned by _rec;
# *Note: dtype can also be a np.dtype object. E.g., np.dtype("float64").

FS = 48000 
DURATION = 0.1
CHANNELS = 2 
DTYPE = 'float64'


# ----------------------------------------------------------------------
# The class implements a custom random generator with a seed obtained
# through the default audio input device.
# It's a subclass of random.Random that overrides only the seed method;
# it records an audio stream with the default parameters and returns the
# content in a newly created np.ndarray.
# Then the array's elements are added together and some transformations
# are performed on the sum, in order to obtain a less uniform float.
# This operation causes the randomness to concern the decimal part in
# particular, which is subject to high fluctuation, even when the noise
# of the surrounding environment is homogeneous over time.
# *Note: the blocking parameter suspends the execution until the entire
# stream is recorded, otherwise the np array will be partially empty.
# *Note: when the seed argument is specified and different than None,
# SDRandom will behave exactly like its superclass

class SDRandom(Random):

    def seed(self, a=None, version=2):
        if isinstance(a, type(None)):
            stream = rec(frames=round(FS * DURATION),
                         samplerate=FS,
                         channels=CHANNELS,
                         dtype=DTYPE,
                         blocking=True
                         )

            # Sum and Standard Deviation of the flattened ndarray.
            sum_, std_ = stream.sum(), stream.std() 

            # round() determines the result's sign.
            b = sum_ - round(sum_)

            # Collecting a number of exponents based on the std' digits.
            e = [1 if int(c) % 2 else -1 for c in str(std_).strip("0.")]

            a = b * 10 ** sum(e)

        super().seed(a)


# ----------------------------------------------------------------------
# Create one instance, seeded from an audio stream, and export its
# methods as module-level functions.
# The functions share state across all uses.

_inst = SDRandom()
# binding class methods to module level functions here
# ...

## ------------------------------------------------------
## ------------------ fork support  ---------------------

if hasattr(_os, "fork"):
    _os.register_at_fork(after_in_child=_inst.seed)


if __name__ == '__main__':
    _test() # See random._test() definition.

根据理论,我的实现仍然没有实现真正的随机性。这怎么可能?即使考虑以下因素,音频输入怎么可能是确定性的?

This operation causes the randomness to concern the decimal part in particular, which is subject to high fluctuation, even when the noise of the surrounding environment is homogeneous over time.

你最好只使用 secrets module for "real" randomness. This provides you with data from your kernel's CSPRNG,它应该不断收集和混合新的熵,旨在让任何攻击者的生活变得非常艰难。

你用无限也不合适,你不能运行“无限长”的东西,宇宙的热寂会在那之前很长时间发生。

使用标准的 Mersenne Twister(如 Python 的 random 模块所做的那样)似乎也不合适,因为攻击者可以在 624 variates 绘制后恢复状态。使用 CSPRNG 会使这变得更加困难,并且不断地混合到新状态,就像您的内核可能所做的那样,进一步强化了这一点。

最后,将样本视为浮点数然后取均值和标准差似乎并不合适。您最好将它们保留为整数并仅通过加密哈希传递它们。例如:

import hashlib
import random

import sounddevice as sd

samples = sd.rec(
    frames=1024,
    samplerate=48000,
    channels=2,
    dtype='int32',
    blocking=True,
)

rv = int.from_bytes(hashlib.sha256(samples).digest(), 'little')
print(rv)

random.seed(rv)
print(random.random())

但话又说回来,请只使用 secrets,这是一个更好的选择。

注意:Linux、Windows、OSX、FreeBSD、OpenBSD 内核的最新版本都按照我上面描述的方式工作。他们在收集熵方面做了很好的尝试,并以合理的方式混合到 CSPRNG 中;例如,参见 Fortuna.