C++ 中 PRNG 的默认随机引擎为 class 的每个实例生成相同的输出 - 正确的种子?

Default random engine for PRNG in C++ generates same output for every instance of a class - proper seed?

我对伪 运行dom 数字生成 (PRNG) 没有经验,但最近我一直在考虑它,因为我想测试一些东西并且手动生成数据很困难而且 f运行kly 很容易出错。

我有以下 class:

#include <QObject>
#include <QList>
#include <QVector3D>
#include <random>
#include <functional>

// TaskCommData is part of a Task instance (a QRunnable).
// It contains all the data required for partially controlling the runnable
// and what it processes inside its run() method
class TaskCommData : public QObject
{
    friend class Task;
    Q_OBJECT
    // Property is used to abort the run() of the Task and also signal the TaskManager that the Task has changed its running status
    Q_PROPERTY(bool running
               READ isRunning
               WRITE setRunningStatus
               NOTIFY signalRunningStatusChanged)
public:
    QString getId() const;  // Task ID
    bool isRunning() const;
signals:
    void signalRunningStatusChanged(QString id, bool running);
public slots:
    void slotAbort();
private:
    bool running;
    QList<QVector3D> data; // Some data in the form of a list of 3D vectors
    QString id;

    // PRNG related members
    std::default_random_engine* engine;
    std::uniform_int_distribution<>* distribution;
    std::function<int()> dice;

    // Private constructor (don't allow creation of TaskCommData outside the Task class which instantiates the class as its class member
    explicit TaskCommData(QString id, QObject *parent = 0);

    void setRunningStatus(bool running);
    QList<QVector3D>* getData();
    void generateData();
};

此对象是在基于 Qt 5.7 的应用程序中创建并附加到一组 QRunnable 的。重要部分如下:

#include <QDebug>
#include "TaskCommData.h"

// ...

TaskCommData::TaskCommData(QString _id, QObject *parent)
    : QObject(parent),
      running(false),
      id(_id)
{
    this->engine = new std::default_random_engine();
    this->distribution = new std::uniform_int_distribution<int>(0, 1);
    this->dice = std::bind(*this->distribution, *this->engine);

    generateData();
}

// ...

void TaskCommData::generateData()
{
    QString s;
    s += QString("Task %1: Generated data [").arg(this->id);
    for(int i = 0; i < 10; ++i) {
        this->data.append(QVector3D(dice(), dice(), dice()));   // PROBLEM occurs here but it's probably just the aftermath
        s += "[" + QString::number(this->data.at(i).x()) + ","
                 + QString::number(this->data.at(i).y()) + ","
                 + QString::number(this->data.at(i).z()) + "]";
    }
    s += "]";
    qDebug() << s;
}

初始化后,我从 qDebug() 得到以下输出(我创建了 Task 的 10 个实例,实例化了 TaskCommData - 每个任务一个):

"Task task_0: Generated data [[1,0,0][0,1,0][1,1,0][1,0,1][0,0,1][0,1,1][0,0,0][1,1,1][0,1,1][1,0,1]]"
Added task "task_0" (sleep:  0)
"Task task_1: Generated data [[1,0,0][0,1,0][1,1,0][1,0,1][0,0,1][0,1,1][0,0,0][1,1,1][0,1,1][1,0,1]]"
Added task "task_1" (sleep:  1315)
"Task task_2: Generated data [[1,0,0][0,1,0][1,1,0][1,0,1][0,0,1][0,1,1][0,0,0][1,1,1][0,1,1][1,0,1]]"
Added task "task_2" (sleep: 7556)
"Task task_3: Generated data [[1,0,0][0,1,0][1,1,0][1,0,1][0,0,1][0,1,1][0,0,0][1,1,1][0,1,1][1,0,1]]"
Added task "task_3" (sleep:  4586)
"Task task_4: Generated data [[1,0,0][0,1,0][1,1,0][1,0,1][0,0,1][0,1,1][0,0,0][1,1,1][0,1,1][1,0,1]]"
Added task "task_4" (sleep: 5328)
"Task task_5: Generated data [[1,0,0][0,1,0][1,1,0][1,0,1][0,0,1][0,1,1][0,0,0][1,1,1][0,1,1][1,0,1]]"
Added task "task_5" (sleep: 2189)
"Task task_6: Generated data [[1,0,0][0,1,0][1,1,0][1,0,1][0,0,1][0,1,1][0,0,0][1,1,1][0,1,1][1,0,1]]"
Added task "task_6" (sleep: 470)
"Task task_7: Generated data [[1,0,0][0,1,0][1,1,0][1,0,1][0,0,1][0,1,1][0,0,0][1,1,1][0,1,1][1,0,1]]"
Added task "task_7" (sleep: 6789)
"Task task_8: Generated data [[1,0,0][0,1,0][1,1,0][1,0,1][0,0,1][0,1,1][0,0,0][1,1,1][0,1,1][1,0,1]]"
Added task "task_8" (sleep: 6793)
"Task task_9: Generated data [[1,0,0][0,1,0][1,1,0][1,0,1][0,0,1][0,1,1][0,0,0][1,1,1][0,1,1][1,0,1]]"
Added task "task_9" (sleep: 9347)

你可能已经从输出中猜到了,我希望有更多的多样性(显然不可能 那么多 多样性,因为一个数据块(QVector3D)包含 3 个二进制值),这里显然出了问题。

您可能还注意到输出中的 (sleep: ...)。这是来自我的 TaskManager class 的输出,它创建了一堆 Tasks 和它们各自的 TaskCommDatas:

void TaskManager::initData()
{
    // Setup PRNG
    std::default_random_engine generator;
    std::uniform_int_distribution<int> distribution(0,10000); // Between 0 and 10000ms
    auto dice = std::bind(distribution, generator);

    this->tasks.reserve(this->taskCount);
    qDebug() << "Adding" << this->taskCount << "tasks...";
    int msPauseBetweenChunks = 0;

    for(int taskIdx = 0; taskIdx < this->taskCount; ++taskIdx) {
        msPauseBetweenChunks = dice();
        Task* task = new Task("task_" + QString::number(taskIdx), msPauseBetweenChunks);
        task->setAutoDelete(false);
        const TaskCommData *taskCommData = task->getCommData();

        // Manage connections
        connect(taskCommData, SIGNAL(signalRunningStatusChanged(QString, bool)),
                this, SLOT(slotRunningStatusChanged(QString, bool)));
        connect(this, SIGNAL(signalAbort()),
                taskCommData, SLOT(slotAbort()));
        this->tasks.insert(task->getCommData()->getId(), task);
        qDebug() << "Added task " << task->getCommData()->getId() << " (sleep: " << msPauseBetweenChunks << ")";
    }

    emit signalCurrentlyRunningTasks(this->tasksRunning, this->taskCount);
}

这里我有同样的东西(虽然不是 class 成员)并且它有效(运行ge 不同但仍然)。

最初我的 void TaskCommData::generateData() 中有相同的代码片段(与 运行dom 号码生成相关的代码片段;TaskManager::initData()),即引擎、分发和计时器在一旦它们 运行 超出范围,就会堆叠并销毁。结果是一样的 - 一遍又一遍地重复同一组 运行dom 数字。

然后我确定问题出在种子上(这里描述缺少可能更合适)。所以我将代码更改为:

// ...
std::chrono::nanoseconds nanoseed = std::chrono::duration_cast<std::chrono::nanoseconds>(std::chrono::system_clock::now().time_since_epoch());
qDebug() << "Setting PRNG engine to seed" << nanoseed.count();
this->engine = new std::default_random_engine();
this->engine->seed(nanoseed.count());
this->distribution = new std::uniform_int_distribution<int>(0, 1);
this->dice = std::bind(*this->distribution, *this->engine);

generateData();
// ...

我得到了稍微好一点的结果:

Setting PRNG engine to seed 1473233571281947000
"Task task_0: Generated data [[1,0,0][0,1,1][0,0,0][0,1,1][1,0,0][1,0,0][0,0,1][1,1,1][1,0,0][1,0,0]]"
Added task  "task_0"  (sleep:  0 )
Setting PRNG engine to seed 1473233571282947700
"Task task_1: Generated data [[1,0,1][1,0,0][1,0,1][0,0,1][1,1,0][0,0,1][0,0,1][0,1,0][0,1,0][0,1,0]]"
Added task  "task_1"  (sleep:  1315 )
Setting PRNG engine to seed 1473233571282947700
"Task task_2: Generated data [[1,0,1][1,0,0][1,0,1][0,0,1][1,1,0][0,0,1][0,0,1][0,1,0][0,1,0][0,1,0]]"
Added task  "task_2"  (sleep:  7556 )
Setting PRNG engine to seed 1473233571283948400
"Task task_3: Generated data [[0,0,1][1,0,1][0,1,1][1,1,1][1,0,0][0,0,0][0,0,1][1,1,0][0,1,1][0,0,1]]"
Added task  "task_3"  (sleep:  4586 )
Setting PRNG engine to seed 1473233571283948400
"Task task_4: Generated data [[0,0,1][1,0,1][0,1,1][1,1,1][1,0,0][0,0,0][0,0,1][1,1,0][0,1,1][0,0,1]]"
Added task  "task_4"  (sleep:  5328 )
Setting PRNG engine to seed 1473233571284950700
"Task task_5: Generated data [[0,0,0][1,1,0][0,0,1][0,0,1][0,1,1][1,0,0][1,0,0][1,0,1][0,0,0][0,0,0]]"
Added task  "task_5"  (sleep:  2189 )
Setting PRNG engine to seed 1473233571284950700
"Task task_6: Generated data [[0,0,0][1,1,0][0,0,1][0,0,1][0,1,1][1,0,0][1,0,0][1,0,1][0,0,0][0,0,0]]"
Added task  "task_6"  (sleep:  470 )
Setting PRNG engine to seed 1473233571285950800
"Task task_7: Generated data [[0,0,0][1,0,0][0,1,1][1,0,0][1,0,1][0,1,0][1,0,1][0,1,0][1,1,0][0,0,1]]"
Added task  "task_7"  (sleep:  6789 )
Setting PRNG engine to seed 1473233571285950800
"Task task_8: Generated data [[0,0,0][1,0,0][0,1,1][1,0,0][1,0,1][0,1,0][1,0,1][0,1,0][1,1,0][0,0,1]]"
Added task  "task_8"  (sleep:  6793 )
Setting PRNG engine to seed 1473233571286950900
"Task task_9: Generated data [[1,0,1][1,1,1][1,0,0][1,1,0][0,1,1][0,0,0][1,0,1][1,0,1][0,0,0][1,0,1]]"
Added task  "task_9"  (sleep:  9347 )

虽然还是有太多的重复(似乎生成了相同的数据对)。这也有一个巨大的缺点,即它受制于 TaskCommData 对象的创建速度以及创建此 class 的两个实例之间的时间。创建速度越快,用 std::chrono::system_clock::now()) 测得的差异越小。这似乎不是生成种子的好方法(当然我可能弄错了 :D)。

知道如何解决这个问题吗?即使问题出在种子上,我仍然不明白为什么在 TaskManager::initData() 中一切正常,而在这里却没有那么多。

所以,是的,第一种情况是正确的:如果您使用相同(默认)种子为所有 PRNG 播种,它们必须产生相同的数字序列。这就是他们设计的目的。

在你的第二种情况下,你使用基于时间的种子,你注意到这也不是很好,因为你实际上只得到三个不同的种子值,你也注意到这不是令人惊讶的是,不同的种子大致在同一时间生成。所以,这是另一个说明为什么基于时间的种子通常不好的例子。老实说,我不知道为什么我们仍然教那个。根据时间播种是个好主意的情况实际上非常罕见¹,如果我想一想,只要您需要的东西实际上是从外部无法预测的。如果您不需要真正不可预测,任何静态种子都可以。

那么,事情就是这样:简单地使用您的任务编号作为种子怎么样?这样,您就可以保证拥有与任务一样多的不同 PRN 序列。如果你需要在不同的运行中有不同的值,你仍然可以先取一个基于时间的随机数(或者更好:向你的 OS 询问一个随机数!)并再次将任务编号添加到其中给你保证不同的序列。


¹ 基于时间的播种一直是 大量 未经授权访问背后的安全问题。典型示例:一些联网的过程控制系统有一个 Web 界面,您需要登录该界面。然后您会得到一个带有秘密会话 ID 的 cookie。唯一的问题是这个会话 ID 只是一个受已知 "stringifier" 约束的随机数,并且 RNG 是用实际用户登录时的时间播种的。因为确定设备时间通常很容易,并且很容易猜出登录可能发生的时间范围,该会话 ID 远非秘密,并且通常可以通过极少量的尝试来暴力破解。