有没有比使用 while 循环更有效的方法来创建每行都有重复文本的文件？（一百万行+）

Question

我需要创建一个仅包含点符号“.”的文本文件。在每一行上重复，直到达到存储在变量中的特定行数。我现在正在使用 while 循环，但是那些带点的文件需要大约 0.5-5 百万行。因此，它花费的时间比我希望的要长一些。以下是我当前的代码：

j=0
while [[ $j != $length ]] 
do
  echo "." >> $file
  ((j++))
done

所以我的问题是：除了使用 while 循环之外，是否有更有效的方法来创建一个包含 x 行且每行包含重复的相同字符（或字符串）的文件？

谢谢，

Answer 1

您可以使用 yes 和 head:

yes . | head -n "$length" > "$file"

这比反复打开和关闭文件一次写入两个字节要快得多。

Answer 2

bash 中的重复操作（例如，通过循环）总是会很慢，如果仅仅是因为启动一个新的 OS 进程（每次通过循环）的开销) 对于循环中的每个命令。在这种情况下，每次通过循环打开和关闭输出文件都会产生额外的开销。

您想寻找一种解决方案来限制需要 created/closed 的 OS 个进程的数量（在这种情况下，限制您 open/close 输出的次数文件）。根据您要使用的 software/tool/binary，将会有很多选项。

一个awk想法：

awk -v len="${length}" 'BEGIN {for (i=1;i<=len;i++) print "."}' > newfile

虽然这确实在 awk 中使用了 'loop'，但我们只查看 bash 级别的单个 OS 进程，而且我们只opening/closing一次输出文件。

Answer 3

此代码中资源最密集的部分是重定向 (echo '.' > $file)。为了解决这个问题，您需要“构建”一个字符串并仅重定向到 $file 一次，而不是 $length 次。

j=0
while [[ $j != $length ]]
do
    builder=${builder}.
done
echo "$builder" > $file

但是您仍处于循环中，这可能不是资源的最佳利用方式。为了解决这个问题，让我们从 this answer:

中汲取灵感

printf '.\n%.0s' $(seq $length) > $file

请注意，这里我们使用 $(seq $length) 而不是 {1..$length}，因为如果长度为 10，bash 不会将 {1..$length} 扩展为 0 1 2 3 4 5 6 7 8 9 10（请参阅 this question)

Answer 4

每次都应该使文件大小加倍。也许它比其他一些解决方案更有效，也许不是。文件“b”的大小将保持加倍，直到加倍超过长度。当长度是 2 的幂时，我认为这会非常有效。

let n=2
let length=1000000
echo '.' > a
cat a a > b
rm a
while [[ $((n*2)) -le $length ]]; do
  mv b a
  cat a a > b
  rm a 
  let n=n*2
done
# do something here to fill out the remaining length-n lines

Answer 5

使用 dd 写入输出文件（用时不到 2 秒）

time yes . | dd of=dotbig.txt count=1024 bs=1048576 iflag=fullblock
1024+0 records in
1024+0 records out
1073741824 bytes (1.1 GB, 1.0 GiB) copied, 1.76116 s, 610 MB/s

real    0m1.814s
user    0m0.076s
sys     0m0.686s

行数

wc -l dotbig.txt
536870912 dotbig.txt

内容示例：

head -n 3 dotbig.txt ; tail -n 3 dotbig.txt
.
.
.
.
.
.

有没有比使用 while 循环更有效的方法来创建每行都有重复文本的文件？（一百万行+）

Is there a more efficient way of creating a file with repetitive text on each line than using a while loop? (1 million lines +)

bash

shell

有没有比使用 while 循环更有效的方法来创建每行都有重复文本的文件？ （一百万行+）

Is there a more efficient way of creating a file with repetitive text on each line than using a while loop? (1 million lines +)

bash

shell

有没有比使用 while 循环更有效的方法来创建每行都有重复文本的文件？（一百万行+）