在 Perl 中，如何过滤目录中的所有日志文件，并提取感兴趣的行？

Question

我正在尝试 select 仅我目录中的 .log 文件，然后在这些文件中搜索单词 "unbound" 并将整行打印到具有相同内容的新输出文件中名称与日志文件相同 (number###.log)，但扩展名为 .txt。这是我目前所拥有的：

#!/usr/bin/perl

  use strict;
  use warnings;

  my $path = $ARGV[0];
  my $outpath = $ARGV[1];
  my @files;
  my $files;

  opendir(DIR,$path) or die "$!";
  @files = grep { /\.log$/} readdir(DIR);


  my @out;
  my $out;
  opendir(OUT,$outpath) or die "$!";

  my $line;
  foreach $files (@files) {
  open (FILE, "$files");
  my @line = <FILE>;
  my $regex = Unbound;
  open (OUT, ">>$out");
  print grep {$line =~ /$regex/ } <>;
   } 
  close OUT;
  close FILE;

  closedir(DIR);
  closedir (OUT);

我是初学者，我真的不知道如何用获取的输出创建一个新的文本文件。

Answer 1

我建议改进此代码的一些事项：

在循环中声明循环迭代器。 foreach my $file ( @files ) {
使用 3 个参数 open: open ( my $input_fh, "<", $filename );
使用 glob 而不是 opendir 然后 grep。 foreach my $file ( <$path/*.txt> ) {
grep 适用于将内容提取到数组中。您的 grep 读取整个文件进行打印，这是没有必要的。不过，如果文件很短也没关系。
perltidy 非常适合重新格式化代码。
您正在打开 'OUT' 到一个目录路径（我认为？），这将不起作用。
$outpath 不是，它是一个文件。您需要做一些不同的事情来输出到不同的文件。 opendir 对输出无效。
因为您使用的是 opendir，它实际上给您的是文件名，而不是完整路径。所以你可能在错误的地方打开了文件。在路径名前加上 chdir 是可能的解决方案。但这是我喜欢 glob 的原因之一，因为它 returns 也是一条路径。

考虑到这一点 - 怎么样：

#!/usr/bin/perl

use strict;
use warnings;
use File::Basename;

#Extract paths
my $input_path  = $ARGV[0];
my $output_path = $ARGV[1];

#Error if paths are invalid. 
unless (defined $input_path
    and -d $input_path
    and defined $output_path
    and -d $output_path )
{
    die "Usage: [=10=] <input_path> <output_path>\n";
}

foreach my $filename (<$input_path/*.log>) {

   # extract the 'name' bit of the filename. 
   # be slightly careful with this - it's based 
   # on an assumption which isn't always true. 
   # File::Spec is a more powerful way of accomplishing this.
   # but should grab 'number####' from /path/to/file/number####.log
   my $output_file = basename ( $filename, '.log' );

   #open input and output filehandles. 
   open( my $input_fh, "<", $filename ) or die $!;
   open( my $output_fh, ">", "$output_path/$output_file.txt" ) or die $!;

   print "Processing $filename -> $output_path/$output_file.txt\n";

   #iterate input, extracting into $line
   while ( my $line = <$input_fh> ) {
        #check if $line matches your RE. 
        if ( $line =~ m/Unbound/ ) {
            #write it to output. 
            print {$output_fh} $line;
        }
   }
   #tidy up our filehandles. Although technically, they'll 
   #close automatically because they leave scope
   close($output_fh);
   close($input_fh);
}

Answer 2

这是一个利用 Path::Tiny. Now, at this stage of your learning process, you are probably better off understanding @Sobrique's solution, but using modules such as Path::Tiny or Path::Class 的脚本，可以更轻松、更快速、更正确地编写这些脚本。

此外，我并没有真正测试这个脚本，所以请注意错误。

#!/usr/bin/env perl

use strict;
use warnings;

use Path::Tiny;

run(\@ARGV);

sub run {
    my $argv = shift;
    unless (@$argv == 2) {
        die "Need source and destination paths\n";
    }
    my $it = path($argv->[0])->realpath->iterator({
        recurse => 0,
        follow_symlinks => 0,
    });
    my $outdir = path($argv->[1])->realpath;

    while (my $path = $it->()) {
        next unless -f $path;
        next unless $path =~ /[.]log\z/;

        my $logfh = $path->openr;
        my $outfile = $outdir->child($path->basename('.log') . '.txt');
        my $outfh;

        while (my $line = <$logfh>) {
            next unless $line =~ /Unbound/;
            unless ($outfh) {
                $outfh = $outfile->openw;
            }
            print $outfh $line;
        }
        close $outfh
            or die "Cannot close output '$outfile': $!";
    }
}

备注

realpath会卡顿
类似 openr and openw。
我正在逐行读取输入文件，以使程序的内存占用与输入文件的大小无关。
在我知道我有一个匹配项要打印到之前，我不会打开输出文件。
使用正则表达式模式匹配文件扩展名时，请记住 \n 是 Unix 文件名中的有效字符，$ 锚点将匹配它。

在 Perl 中，如何过滤目录中的所有日志文件，并提取感兴趣的行？

In Perl, how can filter all log files in a directory, and extract interesting lines?

perl

grep

备注