正确设置随机种子以实现可重复性

Correctly setting random seeds for repeatability

使用 Fortran 90 子程序 random_seed 设置随机种子的方法非常简单。

call random_seed( put=seed )

但是我找不到任何关于设置种子的指南的信息(当你想要可重复性时这是绝对必要的)。我过去听过的民间传说建议标量种子应该很大。例如。 123456789 是比 123 更好的种子。我在网上可以找到的唯一支持是建议使用 "large, odd integer value"

的 ifort 扩展函数 ran()

我知道这可能是特定于实现的,我正在使用 gfortran 4.8.5,但我也对 ifort 和(如果可能)独立于实现的一般指南感兴趣。下面是一些示例代码:

# for compactness, assume seed size of 4, but it will depend on 
# the implementation (e.g. for my version of gfortran 4.8.5 it is 12)

seed1(1:4) = [ 123456789, 987654321, 456789123, 7891234567 ]
seed2(1:4) =   123456789
seed3(1:4) = [         1,         2,         3,          4 ]

我猜 seed1 没问题,但如果您手动设置它(就像我一样),则相当冗长,因为种子长度可以是 12 或 33 或其他。而且我什至不确定它是否很好,因为我根本找不到任何关于设置这些种子的指南。 IE。据我所知,这些种子应该是负数,或者 3 位数的偶数,等等。虽然我猜你希望实现会警告你(?)。

seed2seed3 显然设置起来更方便,据我所知也一样好。 @Ross 建议 seed2 在他的回答中实际上很好:

所以我的问题总结起来就是:如何正确设置种子? seed1seed3 中的任何一个或全部都可以接受吗?

设置种子的准则取决于 RANDOM_NUMBER 使用的 PRNG 算法,但通常 "entropy" 您提供的越多越好。

如果您只有一个标量值,您可以使用一些简单的 PRNG 将其扩展为 RANDOM_SEED 所需的完整种子数组。参见例如https://gcc.gnu.org/onlinedocs/gcc-4.9.1/gfortran/RANDOM_005fSEED.html

示例代码中的函数 lcg

当前版本的 GFortran 有一些防止坏种子的保护措施,它应该对 "dumb" 种子相对免疫(例如 seed(:) 的所有值相同,或者所有值都小甚至为零),但是对于其他编译器的可移植性,遵循我上面建议的内容可能仍然是一个好主意。

您提供给 random_seed( put=... ) 的内容用于确定生成器的起始状态,该状态(如 janneb 所述)应具有尽可能多的熵。您可以构建一些相对复杂的方法来生成此熵 - 以某种方式从系统中获取是一个不错的选择。代码 janneb 链接就是一个很好的例子。

但是,如果有必要,我通常希望能够从给定的种子中复制单个 运行。这对于调试和回归测试很有用。然后,对于生产 运行s,代码可以以某种方式提取单个种子 'randomly'。因此,我想从单个 'seed' 中获得好的 RNG。根据我的经验,这很容易通过提供这个单一种子然后让生成器通过生成数字来增加熵来实现。考虑以下示例:

program main
   implicit none
   integer, parameter :: wp = selected_real_kind(15,307)
   integer, parameter :: n_discard = 100

   integer :: state_size, i
   integer, allocatable, dimension(:) :: state
   real(wp) :: ran, oldran

   call random_seed( size=state_size )
   write(*,*) '-- state size is: ', state_size

   allocate(state(state_size))

   ! -- Simple method of initializing seed from single scalar
   state = 20180815
   call random_seed( put=state )

   ! -- 'Prime' the generator by pulling the first few numbers
   ! -- In reality, these would be discarded but I will print them for demonstration
   ran = 0.5_wp
   do i=1,n_discard
      oldran = ran
      call random_number(ran)
      write(*,'(a,i3,2es26.18)') 'iter, ran, diff: ', i, ran, ran-oldran
   enddo

   ! Now the RNG is 'ready'
end program main

这里,我给一个种子,然后生成一个随机数100次。通常,我会丢弃这些初始的、可能已损坏的数字。在这个例子中,我正在打印它们以查看它们是否看起来是非随机的。 运行 PGI 15.10:

enet-mach5% pgfortran --version

pgfortran 15.10-0 64-bit target on x86-64 Linux -tp sandybridge 
The Portland Group - PGI Compilers and Tools
Copyright (c) 2015, NVIDIA CORPORATION.  All rights reserved.
enet-mach5% pgfortran main.f90 && ./a.out
 -- state size is:            34
iter, ran, diff:   1  8.114813341476008191E-01  3.114813341476008191E-01
iter, ran, diff:   2  8.114813341476008191E-01  0.000000000000000000E+00
iter, ran, diff:   3  8.114813341476008191E-01  0.000000000000000000E+00
iter, ran, diff:   4  8.114813341476008191E-01  0.000000000000000000E+00
iter, ran, diff:   5  8.114813341476008191E-01  0.000000000000000000E+00
iter, ran, diff:   6  2.172220012214012286E-01 -5.942593329261995905E-01
iter, ran, diff:   7  2.172220012214012286E-01  0.000000000000000000E+00
iter, ran, diff:   8  2.172220012214012286E-01  0.000000000000000000E+00
iter, ran, diff:   9  2.172220012214012286E-01  0.000000000000000000E+00
iter, ran, diff:  10  2.172220012214012286E-01  0.000000000000000000E+00
iter, ran, diff:  11  6.229626682952016381E-01  4.057406670738004095E-01
iter, ran, diff:  12  6.229626682952016381E-01  0.000000000000000000E+00
iter, ran, diff:  13  6.229626682952016381E-01  0.000000000000000000E+00
iter, ran, diff:  14  6.229626682952016381E-01  0.000000000000000000E+00
iter, ran, diff:  15  6.229626682952016381E-01  0.000000000000000000E+00
iter, ran, diff:  16  2.870333536900204763E-02 -5.942593329261995905E-01
iter, ran, diff:  17  2.870333536900204763E-02  0.000000000000000000E+00
iter, ran, diff:  18  4.344440024428024572E-01  4.057406670738004095E-01
iter, ran, diff:  19  4.344440024428024572E-01  0.000000000000000000E+00
iter, ran, diff:  20  4.344440024428024572E-01  0.000000000000000000E+00
iter, ran, diff:  21  8.401846695166028667E-01  4.057406670738004095E-01
iter, ran, diff:  22  8.401846695166028667E-01  0.000000000000000000E+00
iter, ran, diff:  23  6.516660036642036857E-01 -1.885186658523991809E-01
iter, ran, diff:  24  6.516660036642036857E-01  0.000000000000000000E+00
iter, ran, diff:  25  6.516660036642036857E-01  0.000000000000000000E+00
iter, ran, diff:  26  5.740667073800409526E-02 -5.942593329261995905E-01
iter, ran, diff:  27  5.740667073800409526E-02  0.000000000000000000E+00
iter, ran, diff:  28  2.746286719594053238E-01  2.172220012214012286E-01
iter, ran, diff:  29  2.746286719594053238E-01  0.000000000000000000E+00
iter, ran, diff:  30  2.746286719594053238E-01  0.000000000000000000E+00
iter, ran, diff:  31  6.803693390332057334E-01  4.057406670738004095E-01
iter, ran, diff:  32  6.803693390332057334E-01  0.000000000000000000E+00
iter, ran, diff:  33  3.033320073284073715E-01 -3.770373317047983619E-01
iter, ran, diff:  34  3.033320073284073715E-01  0.000000000000000000E+00
iter, ran, diff:  35  7.090726744022077810E-01  4.057406670738004095E-01
iter, ran, diff:  36  1.148133414760081905E-01 -5.942593329261995905E-01
iter, ran, diff:  37  1.148133414760081905E-01  0.000000000000000000E+00
iter, ran, diff:  38  1.435166768450102381E-01  2.870333536900204763E-02
iter, ran, diff:  39  1.435166768450102381E-01  0.000000000000000000E+00
iter, ran, diff:  40  3.607386780664114667E-01  2.172220012214012286E-01
iter, ran, diff:  41  7.664793451402118762E-01  4.057406670738004095E-01
iter, ran, diff:  42  7.664793451402118762E-01  0.000000000000000000E+00
iter, ran, diff:  43  2.009233475830143334E-01 -5.655559975571975428E-01
iter, ran, diff:  44  2.009233475830143334E-01  0.000000000000000000E+00
iter, ran, diff:  45  6.353673500258167905E-01  4.344440024428024572E-01
iter, ran, diff:  46  4.110801709961720007E-02 -5.942593329261995905E-01
iter, ran, diff:  47  4.110801709961720007E-02  0.000000000000000000E+00
iter, ran, diff:  48  8.812926866162200668E-01  8.401846695166028667E-01
iter, ran, diff:  49  8.812926866162200668E-01  0.000000000000000000E+00
iter, ran, diff:  50  9.386993573542241620E-01  5.740667073800409526E-02
iter, ran, diff:  51  3.444400244280245715E-01 -5.942593329261995905E-01
iter, ran, diff:  52  7.501806915018249811E-01  4.057406670738004095E-01
iter, ran, diff:  53  9.961060280922282573E-01  2.459253365904032762E-01
iter, ran, diff:  54  9.961060280922282573E-01  0.000000000000000000E+00
iter, ran, diff:  55  8.221603419923440015E-02 -9.138899938929938571E-01
iter, ran, diff:  56  4.879567012730348097E-01  4.057406670738004095E-01
iter, ran, diff:  57  1.109193695682364478E-01 -3.770373317047983619E-01
iter, ran, diff:  58  7.625853732324401335E-01  6.516660036642036857E-01
iter, ran, diff:  59  7.625853732324401335E-01  0.000000000000000000E+00
iter, ran, diff:  60  2.831393817822487335E-01 -4.794459914501914000E-01
iter, ran, diff:  61  6.888800488560491431E-01  4.057406670738004095E-01
iter, ran, diff:  62  7.462867195940532383E-01  5.740667073800409526E-02
iter, ran, diff:  63  8.036933903320573336E-01  5.740667073800409526E-02
iter, ran, diff:  64  8.036933903320573336E-01  0.000000000000000000E+00
iter, ran, diff:  65  1.644320683984688003E-01 -6.392613219335885333E-01
iter, ran, diff:  66  5.701727354722692098E-01  4.057406670738004095E-01
iter, ran, diff:  67  6.849860769482774003E-01  1.148133414760081905E-01
iter, ran, diff:  68  1.481334147600819051E-01 -5.368526621881954952E-01
iter, ran, diff:  69  5.538740818338823146E-01  4.057406670738004095E-01
iter, ran, diff:  70  1.605380964906970576E-01 -3.933359853431852571E-01
iter, ran, diff:  71  5.662787635644974671E-01  4.057406670738004095E-01
iter, ran, diff:  72  7.672021111475118005E-01  2.009233475830143334E-01
iter, ran, diff:  73  6.360901160331167148E-01 -1.311119951143950857E-01
iter, ran, diff:  74  6.647934514021187624E-01  2.870333536900204763E-02
iter, ran, diff:  75  9.231234697231371911E-01  2.583300183210184287E-01
iter, ran, diff:  76  3.288641367969376006E-01 -5.942593329261995905E-01
iter, ran, diff:  77  5.034149292976053403E-02 -2.785226438671770666E-01
iter, ran, diff:  78  3.249701648891658579E-01  2.746286719594053238E-01
iter, ran, diff:  79  4.110801709961720007E-01  8.611000610700614288E-02
iter, ran, diff:  80  7.268168600551945246E-01  3.157366890590225239E-01
iter, ran, diff:  81  1.325575271289949342E-01 -5.942593329261995905E-01
iter, ran, diff:  82  2.147735613282293343E-01  8.221603419923440015E-02
iter, ran, diff:  83  8.951429003614350677E-01  6.803693390332057334E-01
iter, ran, diff:  84  9.606624794444940107E-02 -7.990766524169856666E-01
iter, ran, diff:  85  8.749502748152764298E-01  7.788840268708270287E-01
iter, ran, diff:  86  6.864316089628772488E-01 -1.885186658523991809E-01
iter, ran, diff:  87  3.753116578189263919E-01 -3.111199511439508569E-01
iter, ran, diff:  88  4.614216639259325348E-01  8.611000610700614288E-02
iter, ran, diff:  89  8.632683590919612016E-01  4.018466951660286668E-01
iter, ran, diff:  90  5.110403908483931446E-01 -3.522279682435680570E-01
iter, ran, diff:  91  3.512250603649960112E-01 -1.598153304833971333E-01
iter, ran, diff:  92  2.984351275420635830E-01 -5.278993282293242828E-02
iter, ran, diff:  93  7.902858007228701354E-01  4.918506731808065524E-01
iter, ran, diff:  94  9.136098520217217356E-01  1.233240512988516002E-01
iter, ran, diff:  95  8.360105557375590024E-01 -7.759929628416273317E-02
iter, ran, diff:  96  7.623052313611680120E-01 -7.370532437639099044E-02
iter, ran, diff:  97  2.525198759725810760E-02 -7.370532437639099044E-01
iter, ran, diff:  98  9.228433278518650695E-01  8.975913402546069619E-01
iter, ran, diff:  99  1.283834133499510699E-01 -7.944599145019139996E-01
iter, ran, diff: 100  7.311534560989940701E-01  6.027700427490430002E-01

生成的前 10 个数字中有 8 个是重复的!这很好地说明了为什么一些发电机首先需要高熵状态。然而,在 'some' 时间之后,数字开始看起来合理。

对于我的应用程序来说,100 个左右的随机数是一个非常小的成本,所以每当我为生成器播种时,我都会以这种方式为它们赋值。我没有在 ifort 16.0、gfortran 4.8 或 gfortran 8.1 上观察到这种明显的不良行为。不过,不重复的数字是一个很低的门槛。所以我会为所有编译器做好准备,而不仅仅是那些我观察到的不良行为。

根据评论,一些编译器试图通过以某种方式处理输入状态以产生实际的内部状态来消除不良行为。 Gfortran 使用 "xor cipher"。该操作在 get.

上相反