shell 脚本内存不足

Question

我编写了以下随机数生成器 shell 脚本：

for i in $(seq 1 ) #for as many times, as the first argument () defines...
do 
echo "$i $((RANDOM%))" #print the current iteration number and a random number in [0, )
done

我是这样运行的：

./generator.sh 1000000000 101 > data.txt

生成 1B 行的 id 和 [0,100] 中的随机数并将此数据存储在文件 data.txt 中。

我想要的输出是：

它适用于少量行，但对于 1B，我收到以下 OOM 错误：

./generator.sh: xrealloc: ../bash/subst.c:5179: cannot allocate 18446744071562067968 bytes (4299137024 bytes allocated)

我的程序的哪一部分产生了错误？我如何逐行编写 data.txt 文件？我已尝试将 echo 行替换为：

echo "$i $((RANDOM%))" >>

其中 $3 是 data.txt，但我看不出有什么不同。

Answer 1

$(seq 1 ) 在迭代之前计算整个列表。所以它需要内存来存储整个 10^9 数字列表，这是很多的。

我不确定你是否可以延迟 seq 运行，即仅在需要时获取下一个数字。您可以改为执行简单的 for 循环：

for ((i=0; i<;++i))
do
  echo "$i $((RANDOM%))"
done

Answer 2

问题是你的 for 循环：

for i in $(seq 1 )

这将首先扩展$(seq 1 )，创建一个非常大的列表，然后将其传递给for。

使用while，但是，我们可以逐行读取seq的输出，这会占用少量内存：

seq 1 1000000000 | while read i; do
        echo $i
done

Answer 3

如果你想要速度快，这应该可行。

您将需要使用 g++ 以

形式编译它

g++ -o <executable> <C++file>

例如我是这样做的

g++ -o inseq.exe CTest.cpp

CTest.cpp

#include <iostream>
#include <string>
#include <fstream>
#include <iomanip>
#include <cstdlib>
#include <sstream>

int main (int argc,char *argv[])
{
    std::stringstream ss;
    int x = atoi(argv[1]);
        for(int i=1;i<=x;i++)
        {
                ss << i << "\n";
                if(i%10000==0)
                {
                        std::cout << ss.rdbuf();
                        ss.clear();
                        ss.str(std::string());

                }
        }
std::cout << ss.rdbuf();
ss.clear();
ss.str(std::string());
}

速度比较

为 1000000 行文件提供的每种方法的 3 次测试的最低速度。

抖动

$ time ./inseq 1000000 > file

real    0m0.143s
user    0m0.131s
sys     0m0.011s

吸地毯的人

$ cat Carpet.sh

#!/bin/bash

seq 1  | while read i; do
    echo $i
done

.

$ time ./Carpet.sh 1000000 > file

 real    0m12.223s
 user    0m9.753s
 sys     0m2.140s

哈里·香卡

$ cat Hari.sh

#!/bin/bash

for ((i=0; i<;++i))
do
  echo "$i $((RANDOM%))"
done

.

$ time ./Hari.sh 1000000 > file
real    0m9.729s
user    0m8.084s
sys     0m1.064s

正如您从结果中看到的那样，我的方法稍微快了大约 60-70*。

编辑

因为python很棒

$ cat Py.sh

#!/usr/bin/python

for x in xrange(1, 1000000):
print (x)

'

$ time ./Py.sh >file

real    0m0.543s
user    0m0.499s
sys     0m0.016s

4* 比 c++ 慢，所以如果文件要花一个小时来制作，这两行需要 4 个小时。

编辑 2

决定在 1000000000 行文件上尝试 Python 和 c++

对于 none CPU 密集型任务，这似乎使用了很多 cpu

PID USER  %CPU   TIME+  COMMAND
56056 me  96     2:51.43 Py.sh

Python

的结果

real    9m37.133s
user    8m53.550s
sys     0m8.348s

C++ 的结果

 real    3m9.047s
 user    2m53.400s
 sys     0m2.842s

shell 脚本内存不足

shell script runs out of memory

unix

bash

shell

out-of-memory

速度比较

抖动

吸地毯的人

哈里·香卡

编辑

编辑 2