Perl 将字符剪切到循环正则表达式,打印到行尾

Perl cut characters up to revolving regex, print to end of line

我有这个数据,我想在其中删除日期,并打印从首字母到结尾的所有内容。 我映射了首字母。

30th Mar 2020 5:53:18 pm Charlie Brown: BJ: Bloomberg Runs
30th Mar 2020 5:53:27 pm Charlie Brown: DS: ICE DATA = INC1018483661
30th Mar 2020 6:42:43 pm Boris Yeltsin: Cortese's ICE logs is for the Bloomberg Runs issue
30th Mar 2020 6:43:28 pm Charlie Brown: yeap
31st Mar 2020 4:11:22 am Ishtar Johnson: VK : RE: XS2018777099 & XS2018777172 - INC1018491954
31st Mar 2020 6:31:17 am Tommy Boy: NW: RE: SABSM 6.125 YTW - INC1018495843
31st Mar 2020 7:26:40 am Tommy Boy: AP: RE: Rolling 7yrs - INC1018497102
31st Mar 2020 7:45:36 am Tommy Boy: JK: RE: Chris White books - INC1018497380

这是代码 -

#!/usr/bin/perl

use strict;
use warnings;

my @team = ("AP","II","DS","WJ", "JK","LC","BJ") ;
my ( $team_regex ) = map {qr /$_/} join "|", map {quotemeta} @team;

my @orderdTeam ;
my $filename = shift @ARGV ;
open(my $fh, '<', $filename) or die "Could not open file $filename $!";
while (my $line = <$fh> ) {
        #$line =~ /($team_regex .*)/s  ;
        $line = /($team_regex .*)/s  ;
        print "$line\n";

}
close $fh;

出于某种原因,我收到了这些未初始化的错误。

johnswal@NYKPWM2037968 ~
$ ./cut_date_symphony.pl fooberry
Use of uninitialized value $_ in pattern match (m//) at ./cut_date_symphony.pl line 14, <$fh> line 1.
Use of uninitialized value $_ in pattern match (m//) at ./cut_date_symphony.pl line 14, <$fh> line 2.
Use of uninitialized value $_ in pattern match (m//) at ./cut_date_symphony.pl line 14, <$fh> line 3.
Use of uninitialized value $_ in pattern match (m//) at ./cut_date_symphony.pl line 14, <$fh> line 4.
Use of uninitialized value $_ in pattern match (m//) at ./cut_date_symphony.pl line 14, <$fh> line 5.
Use of uninitialized value $_ in pattern match (m//) at ./cut_date_symphony.pl line 14, <$fh> line 6.
Use of uninitialized value $_ in pattern match (m//) at ./cut_date_symphony.pl line 14, <$fh> line 7.
Use of uninitialized value $_ in pattern match (m//) at ./cut_date_symphony.pl line 14, <$fh> line 8.

注释行只打印出整行——它没有删除日期或时间

#$line =~ /($team_regex .*)/s  ;

所以这就是我要找的。 "Tommy Boy NW:" 和 "Ishtar Johnson VK:" 是我们团队的一员,但来自欧洲。只会显示地图数组“@team_regex”门票中的美国队成员。 并且时间和日期将被删除。

BJ: Bloomberg Runs
DS: ICE DATA = INC1018483661
AP: RE: Rolling 7yrs - INC1018497102
JK: RE: Chris White books - INC1018497380

第 14 行是这一行:

$line = /($team_regex .*)/s  ;

匹配运算符 (/.../) 使用 =~ 运算符作用于绑定到它的变量,如果没有给出这样的变量,则作用于 $_ 。您不使用 =~,因此匹配运算符会尝试匹配 $$_ 不包含任何数据,因此 Perl 会向您显示 "undefined value" 警告。

我想你想将正则表达式与 $line 的内容相匹配。所以你需要使用 =~ 而不是 = - 正如你注释掉的那一行。

$line =~ /($team_regex .*)/s  ;

但在上面的评论中,您解释说您已将其注释掉,因为:

The commented line does not cut any characters out - it prints the whole ine

当然它会这样做,因为您没有编写任何代码来以任何方式更改 $line。但是你想要的是在比赛后的 </code> 中,所以打印出来吧。</p> <pre><code>$line =~ /($team_regex .*)/s ; print ;

但是像 </code> 这样的正则表达式变量只会在成功匹配时设置,所以在打印出来之前检查匹配是否有效很重要。您可以通过将匹配运算符放在 <code>if 语句中来做到这一点。

if ($line =~ /($team_regex .*)/s) {
  print ;
}

更新: 哦,这不起作用,因为您数据中的团队代码后跟一个冒号,而不是 space(正如您的正则表达式假设的那样).所以把它改成这样:

if ($line =~ /($team_regex:.*)/s) {
  print ;
}

请参阅以下代码片段以了解如何实现预期结果

我认为团队的正则表达式应该以不同的方式形成。跳过所有不匹配正则表达式的记录。然后将前 5 数据列 替换为空并打印结果。

use strict;
use warnings;
use feature 'say';

my @team = ("AP","II","DS","WJ", "JK","LC","BJ");

my $re_team = join ': |', @team;

my $filename = shift;

open(my $fh, '<', $filename)
    or die "Could not open file $filename $!";

while( <$fh> ) {
    chomp;
    next unless /$re_team/;
    s/^(\S+ ){5}//;
    say;
}

close $fh;

输入数据

30th Mar 2020 5:53:18 pm Charlie Brown: BJ: Bloomberg Runs
30th Mar 2020 5:53:27 pm Charlie Brown: DS: ICE DATA = INC1018483661
30th Mar 2020 6:42:43 pm Boris Yeltsin: Cortese's ICE logs is for the Bloomberg Runs issue
30th Mar 2020 6:43:28 pm Charlie Brown: yeap
31st Mar 2020 4:11:22 am Ishtar Johnson: VK : RE: XS2018777099 & XS2018777172 - INC1018491954
31st Mar 2020 6:31:17 am Tommy Boy: NW: RE: SABSM 6.125 YTW - INC1018495843
31st Mar 2020 7:26:40 am Tommy Boy: AP: RE: Rolling 7yrs - INC1018497102
31st Mar 2020 7:45:36 am Tommy Boy: JK: RE: Chris White books - INC1018497380

输出

Charlie Brown: BJ: Bloomberg Runs
Charlie Brown: DS: ICE DATA = INC1018483661
Tommy Boy: AP: RE: Rolling 7yrs - INC1018497102
Tommy Boy: JK: RE: Chris White books - INC1018497380

s/^(\S+ ){5}//;替换为s/^(\S+ ){7}//;得到以下输出

BJ: Bloomberg Runs
DS: ICE DATA = INC1018483661
AP: RE: Rolling 7yrs - INC1018497102
JK: RE: Chris White books - INC1018497380

当然代码可以写成

use strict;
use warnings;
use feature 'say';

my @team = ("AP","II","DS","WJ", "JK","LC","BJ");

my $re_team = join ': |', @team;

my $filename = shift;

open(my $fh, '<', $filename)
    or die "Could not open file $filename $!";

/($re_team)/ && say /(.*)/  while <$fh>;

close $fh

甚至这样

use strict;
use warnings;
use feature 'say';

my @team = ("AP","II","DS","WJ", "JK","LC","BJ");

my $re_team = join ': |', @team;

/($re_team)/ && say /(.*)/  while <>;

输出

BJ: Bloomberg Runs
DS: ICE DATA = INC1018483661
AP: RE: Rolling 7yrs - INC1018497102
JK: RE: Chris White books - INC1018497380

如果需要采集数据

use strict;
use warnings;
use feature 'say';

use Data::Dumper;

my @team = ("AP","II","DS","WJ", "JK","LC","BJ");

my $re_team = join ': |', @team;

my @data;

/($re_team)/ && push @data,/(.*)/  while <>;

say Dumper(\@data);

输出

$VAR1 = [
          'BJ: Bloomberg Runs',
          'DS: ICE DATA = INC1018483661',
          'AP: RE: Rolling 7yrs - INC1018497102',
          'JK: RE: Chris White books - INC1018497380'
        ];