生成 0 和 1 的随机文件

Question

我想为我的一个项目生成一个大小为 2MB 的随机文件，该文件仅包含 Linux/Windows 中的 0 和 1。我在 Linux:

中试过这个命令

$ time dd if=/dev/urandom of=/dev/null bs=1M count=2

但是 urandom 只从内核中获取随机数据，然后复制到文件中，这不是我需要的。对此有什么想法吗？

Answer 1

EDIT: All these solutions are pretty bad in practice. tripleee's proposal (pipe the output of /dev/urandom to perl -0777 -ne 'print unpack("b*")') in questions' comments is much better.

你需要快速的东西吗？否则你可以试试（我花了 ~200 万）：

$ time (for i in `seq 1 $((2*1024*1024))`; 
  do echo -n $(($RANDOM%2)); done > random.txt)

您可以通过减少调用 $RANDOM 来使其更快，例如：

$ time (i=$((2*1024*1024)); a=0; while [ $i -gt 0 ]; do if [ $a -lt 2 ]; then 
a=$RANDOM; fi; echo -n "$(($a%2))"; let a=$a/2; let i=$i-1; done > random.txt)

就我而言，速度快了将近 4 倍。它所做的是提取数字的右位，直到数字中不再有 1。因此它可能会稍微偏向 1。

但是，如果您想要一个快速的解决方案，您显然不应该使用 shell 脚本语言。您可以在 python 中轻松完成（在我的情况下这需要大约 2 秒）：

$ time (python -c "import random; print(''.join('{0}'.format(n) for n in 
random.sample([0,1]*16*1024*1024, 2*1024*1024)));" > random.txt)

在这里，我随机抽取了一个包含 0 和 1 的大列表。但是，我不确定抽样对随机性质量的影响。如果列表与样本相比很大，我认为它应该提供高质量的结果，但这里它只有 8 倍大，所以它可能具有可衡量的影响。

请注意，随机性并不像看起来那么容易。我在这里提出的解决方案的输出并不都具有相同的属性，验证它具有哪一个通常很复杂。您可能想用性能换取 'better' 随机性，在这种情况下，python 中的这个版本可能更好（在我的情况下约为 6 秒）：

$ time (python -c "from __future__ import print_function; import random;
[print(random.randint(0,1), end='') for i in range(0, 2*1024*1024)];" > random.txt)

此处，random.randint 应提供均匀分布的结果。

Answer 2

Colin 的解决方案是如此效率极低，因为一种方法会创建一个巨大的列表，然后从中进行选择（因此如果您想要更大的文件，它将无法工作），而另一种方法每个周期只产生 1 个字符

$ time (python3 -c "import random; print(''.join('{0}'.format(n) for n in 
random.sample([0,1]*16*1024*1024, 2*1024*1024)));" > /dev/null)

real    0m4,034s
user    0m3,856s
sys     0m0,137s

$ time (python3 -c "from __future__ import print_function; import random;
[print(random.randint(0,1), end='') for i in range(0, 2*1024*1024)];" > /dev/null)

real    0m6,461s
user    0m6,435s
sys     0m0,016s

快得多

$ time (perl -077 -ne 'print unpack("b*")' < /dev/urandom | head -c2M >/dev/null)

real    0m0,007s
user    0m0,006s
sys     0m0,003s

head -c2M这里是限制输出为2MB

理论上，在一个循环中处理 8 个字节而不是那样处理 1 个字节应该会更快，尽管我不知道如何使用 perl 使其更高效

$ time (</dev/urandom perl -nle 'BEGIN{$/=; $,=""} printf("%.64b", unpack("Q"))' |
head -c2M >/dev/null)

real    0m0,027s
user    0m0,019s
sys     0m0,010s

在What's the fastest way to generate a 1 GB text file containing random digits?中有答案可以产生十进制数字与space分隔符的速度GB 或每秒数十 GB。只生成 binary 值而没有任何 spaces 就像你的情况应该 数量级更快 。我已经改编了其中一些答案以生成 0 和 1。以下是我的 Ubuntu 18.04 VM（Core i7-8700、2GB RAM）的一些基准测试：

$ time (LC_ALL=C tr '[=13=]-7' '[0*128][1*128]' </dev/urandom | head -c2M >/dev/null)

real    0m0,012s
user    0m0,003s
sys     0m0,012s


$ time (jot -s "" -r -c $((2*1024*1024)) 48 49) >/dev/null

real    0m0,297s
user    0m0,279s
sys     0m0,008s

$ time (shuf -r -n $((2*1024*1024)) -i 0-1 -z | tr -d "[=13=]" >/dev/null)

real    0m0,383s
user    0m0,384s
sys     0m0,000s

事实上 /dev/urandom 甚至还不够快，可以用 </dev/zero openssl enc -aes-128-ctr -nosalt -pass file:/dev/urandom 代替以在带有 AES instruction set 的 CPU 上提供更快的随机字节流。这是输出 20MB 文件的时间（因为上面 tr 命令的运行 2MB 的时间太短了，这使得 time returns 的结果千差万别）

$ time (</dev/zero openssl enc -aes-128-ctr -nosalt -pass file:/dev/urandom 2> /dev/null |
LC_ALL=C tr '[=14=]-7' '[0*128][1*128]' | head -c20M >/dev/null)

real    0m0,023s
user    0m0,016s
sys     0m0,023s

$ time (</dev/zero openssl enc -aes-128-ctr -nosalt -pass file:/dev/urandom 2> /dev/null |
perl -077 -ne 'print unpack("b*")' | head -c20M >/dev/null)

real    0m0,038s
user    0m0,024s
sys     0m0,019s

$ time (</dev/zero openssl enc -aes-128-ctr -nosalt -pass file:/dev/urandom 2> /dev/null |
jot -s "" -r -c $((20*1024*1024)) 48 49 >/dev/null)

real    0m2,820s
user    0m2,820s
sys     0m0,000s

生成 0 和 1 的随机文件

Generate a random file of 0's and 1's

linux

windows

random

binaryfiles