从输入文件中删除不包括在某些情况下可能为空的另一个文件中列出的模式的行?

Delete lines from input files excluding patterns listed in another file which may be empty in some cases?

场景 1:

文件1:(文件长度会有所不同,有时可能是空文件)

exclude1
exclude2  
exclude3

文件 2:

statement1 that has no excludes
statement2 that has exclude3
statement3 that has no excludes
statement4 that has no excludes
statement5 that has exclude1
statement6 that has exclude2
statement7 that has no excludes

输出:

statement1 that has no excludes
statement3 that has no excludes
statement4 that has no excludes
statement7 that has no excludes

场景 2:

文件 1 :(空文件)

empty file

文件 2:

statement1 that has no excludes
statement2 that has no excludes
statement3 that has no excludes
statement4 that has no excludes

输出:

statement1 that has no excludes
statement2 that has no excludes
statement3 that has no excludes
statement4 that has no excludes

脚本:

open (IN58, "<file2.txt") or die;
open (IN59, "<file1.txt") or die;
open (OUT42, ">output.txt") or die;
my @excludes = <IN59>;
chomp @excludes;
my $excludes = join ' |',@excludes;
while (<IN58>) {
next if /${excludes}/;
print OUT42 $_ ;
}
close (IN58);
close (IN59);
close (OUT42);

此脚本在场景 1 中运行良好,当排除文件(即文件 1)变为空时,它生成空输出文件并且无法按我希望的方式运行。代码中的任何更正都非常有帮助。

试试这个

这里我使用了否定grep来从文件中提取匹配项,这里if condition用来检查文件是否为空

如果文件为空则不满足条件,所以@br中的值不会改变。如果满足条件,@br 中的值将替换为新值。

use warnings;
use strict;
open my $fh1, "<", "f1.txt" or die"$!";
my @ar = <$fh1>;
my $exculed = join("|",@ar);
$exculed=~ s/\n|\s//g;   #your input have some spaces so i used substitution instead of chomp

open my $fh2, "<", "f2.txt";
open my $nw, ">", "newfile.txt";
my @br = <$fh2>;
@br = grep{!/$exculed/g} @br if ($exculed ne "");
print $nw @br;

这里的诀窍在于高效地测试排除项 - 您可以通过从关键字构建正则表达式,然后 'reject' 'match' 处的任何行来做到这一点全部。

所以:

#!/usr/bin/perl
use strict;
use warnings;

my @excludes = qw ( exclude1
    exclude2
    exclude3 );

my $exclude_regex = join( "|", map {quotemeta} @excludes );
$exclude_regex = qr/$exclude_regex/;


while (<DATA>) {
    print unless /$exclude_regex/;
}


__DATA__
statement1 that has no excludes
statement2 that has exclude3
statement3 that has no excludes
statement4 that has no excludes
statement5 that has exclude1
statement6 that has exclude2
statement7 that has no excludes

现在,这里的问题当然是 - 一个空的 'match' 将匹配任何东西,如此有效 - 你 'wildcard' 在那个点上匹配。 (并排除一切)。

处理这个问题的最简单方法是插入一个永远不会匹配的 'default' 模式 - 例如空行:

my $exclude_regex = join( "|", '^$', map {quotemeta} (  @excludes ) ) ;

这将过滤空白行 任何包含您的排除词之一的内容,生成如下正则表达式:

(?^:^$|exclude1|exclude2|exclude3)

在文件读取位中添加:

!/usr/bin/perl
use strict;
use warnings;

open( my $data,     '<', "file2.txt" )  or die;
open( my $excludes, '<', "file1.txt" )  or die;
open( my $output,   '>', "output.txt" ) or die;

chomp( my @excludes = <$excludes> );
my $exclude_regex = join( "|", '^$', map {quotemeta} (@excludes) );
$exclude_regex = qr/$exclude_regex/;
print $exclude_regex, "\n";

select $output;
while (<$data>) {
    print unless m/$exclude_regex/;
}

而且您的 'regex assembly' 中似乎有一个 space,您可能需要考虑将排除正则表达式更改为:

$exclude_regex = qr/\b$exclude_regex\b/;

这将在模式匹配中包含单词边界(尽管您随后会稍微中断 'empty line' 匹配,并且它不会再匹配 - 但它仍然可以用作占位符)。

虽然我们在这里

  • 3 用词法文件句柄打开的参数很好,
  • use strict; use warnings; 应该被认为是强制性的。
  • 考虑如果您的排除文件包含正则表达式元字符会发生什么。这就是 quotemeta 在那里的原因,将它们视为文字....但您可能会发现在排除文件中支持正则表达式很有用。