使用 while 或 for 的行迭代问题

Question

我正在尝试通过 perl 脚本处理文件中的每一行，而不是将整个文件发送到 perl 脚本，一次将如此多的数据发送到内存。

在 shell 脚本中，我开始了我认为是行迭代的内容，如下所示：

while read line
do
  perl script.pl --script=options "$line"
done < input

执行此操作时，如何将数据保存到输出文件 >> 输出？

while read line
do
  perl script.pl --script=options "$line"
done < input
>> output

如果拆分文件占用的内存较少，那么我的for语句也有问题

for file in /dev/*
   do 
       split -l 1000 $file prefix
done < input
## Where do I save the output?

for file in /dev/out/*
   do 
      perl script.pl --script=options

等...

哪种方法最节省内存

Answer 1

试试这个：

while read line
do
  perl script.pl --script=options "$line" >> "out"
done < input

"out" 是您的输出文件的名称。

Answer 2

您还可以在 perl 脚本中逐行处理非常大的文件，而无需将整个文件加载到内存中。为此，您只需要用 while 循环将当前 perl 脚本的文本（我希望不要再读取内存中的文件 :) ）括起来。例如：

my $line;
while ($line = <>) {
    // your script text here, refering to $line variable instead of param variable
}

并且在此 perl 脚本中，您还可以将结果写入输出文件。比如，如果结果存储在变量 $res 中，你可以这样做：

open (my $fh, ">>", "out") or die "ERROR: $!"; # opening a file descriptor
my $line;
while ($line = <>) {
    // your script text here, refering to $line variable instead of param variable
    print $fh $res, "\n"; # writing to file descriptor
}
close $fh; # closing file descriptor

Answer 3

我解决了我的问题：

  split -l 100000 input /dev/shm/split/split.input.txt.
  find /dev/shm/split/ -type f -name '*.txt.* -exec perl script.pl --script=options {} + > output

这使我的脚本处理文件的速度更快。

使用 while 或 for 的行迭代问题

Trouble with line iteration using while or for

shell

perl

split

find