C++ 中 PRNG 的默认随机引擎为 class 的每个实例生成相同的输出 - 正确的种子?
Default random engine for PRNG in C++ generates same output for every instance of a class - proper seed?
我对伪 运行dom 数字生成 (PRNG) 没有经验,但最近我一直在考虑它,因为我想测试一些东西并且手动生成数据很困难而且 f运行kly 很容易出错。
我有以下 class:
#include <QObject>
#include <QList>
#include <QVector3D>
#include <random>
#include <functional>
// TaskCommData is part of a Task instance (a QRunnable).
// It contains all the data required for partially controlling the runnable
// and what it processes inside its run() method
class TaskCommData : public QObject
{
friend class Task;
Q_OBJECT
// Property is used to abort the run() of the Task and also signal the TaskManager that the Task has changed its running status
Q_PROPERTY(bool running
READ isRunning
WRITE setRunningStatus
NOTIFY signalRunningStatusChanged)
public:
QString getId() const; // Task ID
bool isRunning() const;
signals:
void signalRunningStatusChanged(QString id, bool running);
public slots:
void slotAbort();
private:
bool running;
QList<QVector3D> data; // Some data in the form of a list of 3D vectors
QString id;
// PRNG related members
std::default_random_engine* engine;
std::uniform_int_distribution<>* distribution;
std::function<int()> dice;
// Private constructor (don't allow creation of TaskCommData outside the Task class which instantiates the class as its class member
explicit TaskCommData(QString id, QObject *parent = 0);
void setRunningStatus(bool running);
QList<QVector3D>* getData();
void generateData();
};
此对象是在基于 Qt 5.7 的应用程序中创建并附加到一组 QRunnable
的。重要部分如下:
#include <QDebug>
#include "TaskCommData.h"
// ...
TaskCommData::TaskCommData(QString _id, QObject *parent)
: QObject(parent),
running(false),
id(_id)
{
this->engine = new std::default_random_engine();
this->distribution = new std::uniform_int_distribution<int>(0, 1);
this->dice = std::bind(*this->distribution, *this->engine);
generateData();
}
// ...
void TaskCommData::generateData()
{
QString s;
s += QString("Task %1: Generated data [").arg(this->id);
for(int i = 0; i < 10; ++i) {
this->data.append(QVector3D(dice(), dice(), dice())); // PROBLEM occurs here but it's probably just the aftermath
s += "[" + QString::number(this->data.at(i).x()) + ","
+ QString::number(this->data.at(i).y()) + ","
+ QString::number(this->data.at(i).z()) + "]";
}
s += "]";
qDebug() << s;
}
初始化后,我从 qDebug()
得到以下输出(我创建了 Task
的 10 个实例,实例化了 TaskCommData
- 每个任务一个):
"Task task_0: Generated data [[1,0,0][0,1,0][1,1,0][1,0,1][0,0,1][0,1,1][0,0,0][1,1,1][0,1,1][1,0,1]]"
Added task "task_0" (sleep: 0)
"Task task_1: Generated data [[1,0,0][0,1,0][1,1,0][1,0,1][0,0,1][0,1,1][0,0,0][1,1,1][0,1,1][1,0,1]]"
Added task "task_1" (sleep: 1315)
"Task task_2: Generated data [[1,0,0][0,1,0][1,1,0][1,0,1][0,0,1][0,1,1][0,0,0][1,1,1][0,1,1][1,0,1]]"
Added task "task_2" (sleep: 7556)
"Task task_3: Generated data [[1,0,0][0,1,0][1,1,0][1,0,1][0,0,1][0,1,1][0,0,0][1,1,1][0,1,1][1,0,1]]"
Added task "task_3" (sleep: 4586)
"Task task_4: Generated data [[1,0,0][0,1,0][1,1,0][1,0,1][0,0,1][0,1,1][0,0,0][1,1,1][0,1,1][1,0,1]]"
Added task "task_4" (sleep: 5328)
"Task task_5: Generated data [[1,0,0][0,1,0][1,1,0][1,0,1][0,0,1][0,1,1][0,0,0][1,1,1][0,1,1][1,0,1]]"
Added task "task_5" (sleep: 2189)
"Task task_6: Generated data [[1,0,0][0,1,0][1,1,0][1,0,1][0,0,1][0,1,1][0,0,0][1,1,1][0,1,1][1,0,1]]"
Added task "task_6" (sleep: 470)
"Task task_7: Generated data [[1,0,0][0,1,0][1,1,0][1,0,1][0,0,1][0,1,1][0,0,0][1,1,1][0,1,1][1,0,1]]"
Added task "task_7" (sleep: 6789)
"Task task_8: Generated data [[1,0,0][0,1,0][1,1,0][1,0,1][0,0,1][0,1,1][0,0,0][1,1,1][0,1,1][1,0,1]]"
Added task "task_8" (sleep: 6793)
"Task task_9: Generated data [[1,0,0][0,1,0][1,1,0][1,0,1][0,0,1][0,1,1][0,0,0][1,1,1][0,1,1][1,0,1]]"
Added task "task_9" (sleep: 9347)
你可能已经从输出中猜到了,我希望有更多的多样性(显然不可能 那么多 多样性,因为一个数据块(QVector3D
)包含 3 个二进制值),这里显然出了问题。
您可能还注意到输出中的 (sleep: ...)
。这是来自我的 TaskManager
class 的输出,它创建了一堆 Task
s 和它们各自的 TaskCommData
s:
void TaskManager::initData()
{
// Setup PRNG
std::default_random_engine generator;
std::uniform_int_distribution<int> distribution(0,10000); // Between 0 and 10000ms
auto dice = std::bind(distribution, generator);
this->tasks.reserve(this->taskCount);
qDebug() << "Adding" << this->taskCount << "tasks...";
int msPauseBetweenChunks = 0;
for(int taskIdx = 0; taskIdx < this->taskCount; ++taskIdx) {
msPauseBetweenChunks = dice();
Task* task = new Task("task_" + QString::number(taskIdx), msPauseBetweenChunks);
task->setAutoDelete(false);
const TaskCommData *taskCommData = task->getCommData();
// Manage connections
connect(taskCommData, SIGNAL(signalRunningStatusChanged(QString, bool)),
this, SLOT(slotRunningStatusChanged(QString, bool)));
connect(this, SIGNAL(signalAbort()),
taskCommData, SLOT(slotAbort()));
this->tasks.insert(task->getCommData()->getId(), task);
qDebug() << "Added task " << task->getCommData()->getId() << " (sleep: " << msPauseBetweenChunks << ")";
}
emit signalCurrentlyRunningTasks(this->tasksRunning, this->taskCount);
}
这里我有同样的东西(虽然不是 class 成员)并且它有效(运行ge 不同但仍然)。
最初我的 void TaskCommData::generateData()
中有相同的代码片段(与 运行dom 号码生成相关的代码片段;TaskManager::initData()
),即引擎、分发和计时器在一旦它们 运行 超出范围,就会堆叠并销毁。结果是一样的 - 一遍又一遍地重复同一组 运行dom 数字。
然后我确定问题出在种子上(这里描述缺少可能更合适)。所以我将代码更改为:
// ...
std::chrono::nanoseconds nanoseed = std::chrono::duration_cast<std::chrono::nanoseconds>(std::chrono::system_clock::now().time_since_epoch());
qDebug() << "Setting PRNG engine to seed" << nanoseed.count();
this->engine = new std::default_random_engine();
this->engine->seed(nanoseed.count());
this->distribution = new std::uniform_int_distribution<int>(0, 1);
this->dice = std::bind(*this->distribution, *this->engine);
generateData();
// ...
我得到了稍微好一点的结果:
Setting PRNG engine to seed 1473233571281947000
"Task task_0: Generated data [[1,0,0][0,1,1][0,0,0][0,1,1][1,0,0][1,0,0][0,0,1][1,1,1][1,0,0][1,0,0]]"
Added task "task_0" (sleep: 0 )
Setting PRNG engine to seed 1473233571282947700
"Task task_1: Generated data [[1,0,1][1,0,0][1,0,1][0,0,1][1,1,0][0,0,1][0,0,1][0,1,0][0,1,0][0,1,0]]"
Added task "task_1" (sleep: 1315 )
Setting PRNG engine to seed 1473233571282947700
"Task task_2: Generated data [[1,0,1][1,0,0][1,0,1][0,0,1][1,1,0][0,0,1][0,0,1][0,1,0][0,1,0][0,1,0]]"
Added task "task_2" (sleep: 7556 )
Setting PRNG engine to seed 1473233571283948400
"Task task_3: Generated data [[0,0,1][1,0,1][0,1,1][1,1,1][1,0,0][0,0,0][0,0,1][1,1,0][0,1,1][0,0,1]]"
Added task "task_3" (sleep: 4586 )
Setting PRNG engine to seed 1473233571283948400
"Task task_4: Generated data [[0,0,1][1,0,1][0,1,1][1,1,1][1,0,0][0,0,0][0,0,1][1,1,0][0,1,1][0,0,1]]"
Added task "task_4" (sleep: 5328 )
Setting PRNG engine to seed 1473233571284950700
"Task task_5: Generated data [[0,0,0][1,1,0][0,0,1][0,0,1][0,1,1][1,0,0][1,0,0][1,0,1][0,0,0][0,0,0]]"
Added task "task_5" (sleep: 2189 )
Setting PRNG engine to seed 1473233571284950700
"Task task_6: Generated data [[0,0,0][1,1,0][0,0,1][0,0,1][0,1,1][1,0,0][1,0,0][1,0,1][0,0,0][0,0,0]]"
Added task "task_6" (sleep: 470 )
Setting PRNG engine to seed 1473233571285950800
"Task task_7: Generated data [[0,0,0][1,0,0][0,1,1][1,0,0][1,0,1][0,1,0][1,0,1][0,1,0][1,1,0][0,0,1]]"
Added task "task_7" (sleep: 6789 )
Setting PRNG engine to seed 1473233571285950800
"Task task_8: Generated data [[0,0,0][1,0,0][0,1,1][1,0,0][1,0,1][0,1,0][1,0,1][0,1,0][1,1,0][0,0,1]]"
Added task "task_8" (sleep: 6793 )
Setting PRNG engine to seed 1473233571286950900
"Task task_9: Generated data [[1,0,1][1,1,1][1,0,0][1,1,0][0,1,1][0,0,0][1,0,1][1,0,1][0,0,0][1,0,1]]"
Added task "task_9" (sleep: 9347 )
虽然还是有太多的重复(似乎生成了相同的数据对)。这也有一个巨大的缺点,即它受制于 TaskCommData
对象的创建速度以及创建此 class 的两个实例之间的时间。创建速度越快,用 std::chrono::system_clock::now()
) 测得的差异越小。这似乎不是生成种子的好方法(当然我可能弄错了 :D)。
知道如何解决这个问题吗?即使问题出在种子上,我仍然不明白为什么在 TaskManager::initData()
中一切正常,而在这里却没有那么多。
所以,是的,第一种情况是正确的:如果您使用相同(默认)种子为所有 PRNG 播种,它们必须产生相同的数字序列。这就是他们设计的目的。
在你的第二种情况下,你使用基于时间的种子,你注意到这也不是很好,因为你实际上只得到三个不同的种子值,你也注意到这不是令人惊讶的是,不同的种子大致在同一时间生成。所以,这是另一个说明为什么基于时间的种子通常不好的例子。老实说,我不知道为什么我们仍然教那个。根据时间播种是个好主意的情况实际上非常罕见¹,如果我想一想,只要您需要的东西实际上是从外部无法预测的。如果您不需要真正不可预测,任何静态种子都可以。
那么,事情就是这样:简单地使用您的任务编号作为种子怎么样?这样,您就可以保证拥有与任务一样多的不同 PRN 序列。如果你需要在不同的运行中有不同的值,你仍然可以先取一个基于时间的随机数(或者更好:向你的 OS 询问一个随机数!)并再次将任务编号添加到其中给你保证不同的序列。
¹ 基于时间的播种一直是 大量 未经授权访问背后的安全问题。典型示例:一些联网的过程控制系统有一个 Web 界面,您需要登录该界面。然后您会得到一个带有秘密会话 ID 的 cookie。唯一的问题是这个会话 ID 只是一个受已知 "stringifier" 约束的随机数,并且 RNG 是用实际用户登录时的时间播种的。因为确定设备时间通常很容易,并且很容易猜出登录可能发生的时间范围,该会话 ID 远非秘密,并且通常可以通过极少量的尝试来暴力破解。
我对伪 运行dom 数字生成 (PRNG) 没有经验,但最近我一直在考虑它,因为我想测试一些东西并且手动生成数据很困难而且 f运行kly 很容易出错。
我有以下 class:
#include <QObject>
#include <QList>
#include <QVector3D>
#include <random>
#include <functional>
// TaskCommData is part of a Task instance (a QRunnable).
// It contains all the data required for partially controlling the runnable
// and what it processes inside its run() method
class TaskCommData : public QObject
{
friend class Task;
Q_OBJECT
// Property is used to abort the run() of the Task and also signal the TaskManager that the Task has changed its running status
Q_PROPERTY(bool running
READ isRunning
WRITE setRunningStatus
NOTIFY signalRunningStatusChanged)
public:
QString getId() const; // Task ID
bool isRunning() const;
signals:
void signalRunningStatusChanged(QString id, bool running);
public slots:
void slotAbort();
private:
bool running;
QList<QVector3D> data; // Some data in the form of a list of 3D vectors
QString id;
// PRNG related members
std::default_random_engine* engine;
std::uniform_int_distribution<>* distribution;
std::function<int()> dice;
// Private constructor (don't allow creation of TaskCommData outside the Task class which instantiates the class as its class member
explicit TaskCommData(QString id, QObject *parent = 0);
void setRunningStatus(bool running);
QList<QVector3D>* getData();
void generateData();
};
此对象是在基于 Qt 5.7 的应用程序中创建并附加到一组 QRunnable
的。重要部分如下:
#include <QDebug>
#include "TaskCommData.h"
// ...
TaskCommData::TaskCommData(QString _id, QObject *parent)
: QObject(parent),
running(false),
id(_id)
{
this->engine = new std::default_random_engine();
this->distribution = new std::uniform_int_distribution<int>(0, 1);
this->dice = std::bind(*this->distribution, *this->engine);
generateData();
}
// ...
void TaskCommData::generateData()
{
QString s;
s += QString("Task %1: Generated data [").arg(this->id);
for(int i = 0; i < 10; ++i) {
this->data.append(QVector3D(dice(), dice(), dice())); // PROBLEM occurs here but it's probably just the aftermath
s += "[" + QString::number(this->data.at(i).x()) + ","
+ QString::number(this->data.at(i).y()) + ","
+ QString::number(this->data.at(i).z()) + "]";
}
s += "]";
qDebug() << s;
}
初始化后,我从 qDebug()
得到以下输出(我创建了 Task
的 10 个实例,实例化了 TaskCommData
- 每个任务一个):
"Task task_0: Generated data [[1,0,0][0,1,0][1,1,0][1,0,1][0,0,1][0,1,1][0,0,0][1,1,1][0,1,1][1,0,1]]"
Added task "task_0" (sleep: 0)
"Task task_1: Generated data [[1,0,0][0,1,0][1,1,0][1,0,1][0,0,1][0,1,1][0,0,0][1,1,1][0,1,1][1,0,1]]"
Added task "task_1" (sleep: 1315)
"Task task_2: Generated data [[1,0,0][0,1,0][1,1,0][1,0,1][0,0,1][0,1,1][0,0,0][1,1,1][0,1,1][1,0,1]]"
Added task "task_2" (sleep: 7556)
"Task task_3: Generated data [[1,0,0][0,1,0][1,1,0][1,0,1][0,0,1][0,1,1][0,0,0][1,1,1][0,1,1][1,0,1]]"
Added task "task_3" (sleep: 4586)
"Task task_4: Generated data [[1,0,0][0,1,0][1,1,0][1,0,1][0,0,1][0,1,1][0,0,0][1,1,1][0,1,1][1,0,1]]"
Added task "task_4" (sleep: 5328)
"Task task_5: Generated data [[1,0,0][0,1,0][1,1,0][1,0,1][0,0,1][0,1,1][0,0,0][1,1,1][0,1,1][1,0,1]]"
Added task "task_5" (sleep: 2189)
"Task task_6: Generated data [[1,0,0][0,1,0][1,1,0][1,0,1][0,0,1][0,1,1][0,0,0][1,1,1][0,1,1][1,0,1]]"
Added task "task_6" (sleep: 470)
"Task task_7: Generated data [[1,0,0][0,1,0][1,1,0][1,0,1][0,0,1][0,1,1][0,0,0][1,1,1][0,1,1][1,0,1]]"
Added task "task_7" (sleep: 6789)
"Task task_8: Generated data [[1,0,0][0,1,0][1,1,0][1,0,1][0,0,1][0,1,1][0,0,0][1,1,1][0,1,1][1,0,1]]"
Added task "task_8" (sleep: 6793)
"Task task_9: Generated data [[1,0,0][0,1,0][1,1,0][1,0,1][0,0,1][0,1,1][0,0,0][1,1,1][0,1,1][1,0,1]]"
Added task "task_9" (sleep: 9347)
你可能已经从输出中猜到了,我希望有更多的多样性(显然不可能 那么多 多样性,因为一个数据块(QVector3D
)包含 3 个二进制值),这里显然出了问题。
您可能还注意到输出中的 (sleep: ...)
。这是来自我的 TaskManager
class 的输出,它创建了一堆 Task
s 和它们各自的 TaskCommData
s:
void TaskManager::initData()
{
// Setup PRNG
std::default_random_engine generator;
std::uniform_int_distribution<int> distribution(0,10000); // Between 0 and 10000ms
auto dice = std::bind(distribution, generator);
this->tasks.reserve(this->taskCount);
qDebug() << "Adding" << this->taskCount << "tasks...";
int msPauseBetweenChunks = 0;
for(int taskIdx = 0; taskIdx < this->taskCount; ++taskIdx) {
msPauseBetweenChunks = dice();
Task* task = new Task("task_" + QString::number(taskIdx), msPauseBetweenChunks);
task->setAutoDelete(false);
const TaskCommData *taskCommData = task->getCommData();
// Manage connections
connect(taskCommData, SIGNAL(signalRunningStatusChanged(QString, bool)),
this, SLOT(slotRunningStatusChanged(QString, bool)));
connect(this, SIGNAL(signalAbort()),
taskCommData, SLOT(slotAbort()));
this->tasks.insert(task->getCommData()->getId(), task);
qDebug() << "Added task " << task->getCommData()->getId() << " (sleep: " << msPauseBetweenChunks << ")";
}
emit signalCurrentlyRunningTasks(this->tasksRunning, this->taskCount);
}
这里我有同样的东西(虽然不是 class 成员)并且它有效(运行ge 不同但仍然)。
最初我的 void TaskCommData::generateData()
中有相同的代码片段(与 运行dom 号码生成相关的代码片段;TaskManager::initData()
),即引擎、分发和计时器在一旦它们 运行 超出范围,就会堆叠并销毁。结果是一样的 - 一遍又一遍地重复同一组 运行dom 数字。
然后我确定问题出在种子上(这里描述缺少可能更合适)。所以我将代码更改为:
// ...
std::chrono::nanoseconds nanoseed = std::chrono::duration_cast<std::chrono::nanoseconds>(std::chrono::system_clock::now().time_since_epoch());
qDebug() << "Setting PRNG engine to seed" << nanoseed.count();
this->engine = new std::default_random_engine();
this->engine->seed(nanoseed.count());
this->distribution = new std::uniform_int_distribution<int>(0, 1);
this->dice = std::bind(*this->distribution, *this->engine);
generateData();
// ...
我得到了稍微好一点的结果:
Setting PRNG engine to seed 1473233571281947000
"Task task_0: Generated data [[1,0,0][0,1,1][0,0,0][0,1,1][1,0,0][1,0,0][0,0,1][1,1,1][1,0,0][1,0,0]]"
Added task "task_0" (sleep: 0 )
Setting PRNG engine to seed 1473233571282947700
"Task task_1: Generated data [[1,0,1][1,0,0][1,0,1][0,0,1][1,1,0][0,0,1][0,0,1][0,1,0][0,1,0][0,1,0]]"
Added task "task_1" (sleep: 1315 )
Setting PRNG engine to seed 1473233571282947700
"Task task_2: Generated data [[1,0,1][1,0,0][1,0,1][0,0,1][1,1,0][0,0,1][0,0,1][0,1,0][0,1,0][0,1,0]]"
Added task "task_2" (sleep: 7556 )
Setting PRNG engine to seed 1473233571283948400
"Task task_3: Generated data [[0,0,1][1,0,1][0,1,1][1,1,1][1,0,0][0,0,0][0,0,1][1,1,0][0,1,1][0,0,1]]"
Added task "task_3" (sleep: 4586 )
Setting PRNG engine to seed 1473233571283948400
"Task task_4: Generated data [[0,0,1][1,0,1][0,1,1][1,1,1][1,0,0][0,0,0][0,0,1][1,1,0][0,1,1][0,0,1]]"
Added task "task_4" (sleep: 5328 )
Setting PRNG engine to seed 1473233571284950700
"Task task_5: Generated data [[0,0,0][1,1,0][0,0,1][0,0,1][0,1,1][1,0,0][1,0,0][1,0,1][0,0,0][0,0,0]]"
Added task "task_5" (sleep: 2189 )
Setting PRNG engine to seed 1473233571284950700
"Task task_6: Generated data [[0,0,0][1,1,0][0,0,1][0,0,1][0,1,1][1,0,0][1,0,0][1,0,1][0,0,0][0,0,0]]"
Added task "task_6" (sleep: 470 )
Setting PRNG engine to seed 1473233571285950800
"Task task_7: Generated data [[0,0,0][1,0,0][0,1,1][1,0,0][1,0,1][0,1,0][1,0,1][0,1,0][1,1,0][0,0,1]]"
Added task "task_7" (sleep: 6789 )
Setting PRNG engine to seed 1473233571285950800
"Task task_8: Generated data [[0,0,0][1,0,0][0,1,1][1,0,0][1,0,1][0,1,0][1,0,1][0,1,0][1,1,0][0,0,1]]"
Added task "task_8" (sleep: 6793 )
Setting PRNG engine to seed 1473233571286950900
"Task task_9: Generated data [[1,0,1][1,1,1][1,0,0][1,1,0][0,1,1][0,0,0][1,0,1][1,0,1][0,0,0][1,0,1]]"
Added task "task_9" (sleep: 9347 )
虽然还是有太多的重复(似乎生成了相同的数据对)。这也有一个巨大的缺点,即它受制于 TaskCommData
对象的创建速度以及创建此 class 的两个实例之间的时间。创建速度越快,用 std::chrono::system_clock::now()
) 测得的差异越小。这似乎不是生成种子的好方法(当然我可能弄错了 :D)。
知道如何解决这个问题吗?即使问题出在种子上,我仍然不明白为什么在 TaskManager::initData()
中一切正常,而在这里却没有那么多。
所以,是的,第一种情况是正确的:如果您使用相同(默认)种子为所有 PRNG 播种,它们必须产生相同的数字序列。这就是他们设计的目的。
在你的第二种情况下,你使用基于时间的种子,你注意到这也不是很好,因为你实际上只得到三个不同的种子值,你也注意到这不是令人惊讶的是,不同的种子大致在同一时间生成。所以,这是另一个说明为什么基于时间的种子通常不好的例子。老实说,我不知道为什么我们仍然教那个。根据时间播种是个好主意的情况实际上非常罕见¹,如果我想一想,只要您需要的东西实际上是从外部无法预测的。如果您不需要真正不可预测,任何静态种子都可以。
那么,事情就是这样:简单地使用您的任务编号作为种子怎么样?这样,您就可以保证拥有与任务一样多的不同 PRN 序列。如果你需要在不同的运行中有不同的值,你仍然可以先取一个基于时间的随机数(或者更好:向你的 OS 询问一个随机数!)并再次将任务编号添加到其中给你保证不同的序列。
¹ 基于时间的播种一直是 大量 未经授权访问背后的安全问题。典型示例:一些联网的过程控制系统有一个 Web 界面,您需要登录该界面。然后您会得到一个带有秘密会话 ID 的 cookie。唯一的问题是这个会话 ID 只是一个受已知 "stringifier" 约束的随机数,并且 RNG 是用实际用户登录时的时间播种的。因为确定设备时间通常很容易,并且很容易猜出登录可能发生的时间范围,该会话 ID 远非秘密,并且通常可以通过极少量的尝试来暴力破解。