Perl 将字符剪切到循环正则表达式,打印到行尾
Perl cut characters up to revolving regex, print to end of line
我有这个数据,我想在其中删除日期,并打印从首字母到结尾的所有内容。
我映射了首字母。
30th Mar 2020 5:53:18 pm Charlie Brown: BJ: Bloomberg Runs
30th Mar 2020 5:53:27 pm Charlie Brown: DS: ICE DATA = INC1018483661
30th Mar 2020 6:42:43 pm Boris Yeltsin: Cortese's ICE logs is for the Bloomberg Runs issue
30th Mar 2020 6:43:28 pm Charlie Brown: yeap
31st Mar 2020 4:11:22 am Ishtar Johnson: VK : RE: XS2018777099 & XS2018777172 - INC1018491954
31st Mar 2020 6:31:17 am Tommy Boy: NW: RE: SABSM 6.125 YTW - INC1018495843
31st Mar 2020 7:26:40 am Tommy Boy: AP: RE: Rolling 7yrs - INC1018497102
31st Mar 2020 7:45:36 am Tommy Boy: JK: RE: Chris White books - INC1018497380
这是代码 -
#!/usr/bin/perl
use strict;
use warnings;
my @team = ("AP","II","DS","WJ", "JK","LC","BJ") ;
my ( $team_regex ) = map {qr /$_/} join "|", map {quotemeta} @team;
my @orderdTeam ;
my $filename = shift @ARGV ;
open(my $fh, '<', $filename) or die "Could not open file $filename $!";
while (my $line = <$fh> ) {
#$line =~ /($team_regex .*)/s ;
$line = /($team_regex .*)/s ;
print "$line\n";
}
close $fh;
出于某种原因,我收到了这些未初始化的错误。
johnswal@NYKPWM2037968 ~
$ ./cut_date_symphony.pl fooberry
Use of uninitialized value $_ in pattern match (m//) at ./cut_date_symphony.pl line 14, <$fh> line 1.
Use of uninitialized value $_ in pattern match (m//) at ./cut_date_symphony.pl line 14, <$fh> line 2.
Use of uninitialized value $_ in pattern match (m//) at ./cut_date_symphony.pl line 14, <$fh> line 3.
Use of uninitialized value $_ in pattern match (m//) at ./cut_date_symphony.pl line 14, <$fh> line 4.
Use of uninitialized value $_ in pattern match (m//) at ./cut_date_symphony.pl line 14, <$fh> line 5.
Use of uninitialized value $_ in pattern match (m//) at ./cut_date_symphony.pl line 14, <$fh> line 6.
Use of uninitialized value $_ in pattern match (m//) at ./cut_date_symphony.pl line 14, <$fh> line 7.
Use of uninitialized value $_ in pattern match (m//) at ./cut_date_symphony.pl line 14, <$fh> line 8.
注释行只打印出整行——它没有删除日期或时间
#$line =~ /($team_regex .*)/s ;
所以这就是我要找的。 "Tommy Boy NW:" 和 "Ishtar Johnson VK:" 是我们团队的一员,但来自欧洲。只会显示地图数组“@team_regex”门票中的美国队成员。
并且时间和日期将被删除。
BJ: Bloomberg Runs
DS: ICE DATA = INC1018483661
AP: RE: Rolling 7yrs - INC1018497102
JK: RE: Chris White books - INC1018497380
第 14 行是这一行:
$line = /($team_regex .*)/s ;
匹配运算符 (/.../
) 使用 =~
运算符作用于绑定到它的变量,如果没有给出这样的变量,则作用于 $_
。您不使用 =~
,因此匹配运算符会尝试匹配 $
。 $_
不包含任何数据,因此 Perl 会向您显示 "undefined value" 警告。
我想你想将正则表达式与 $line
的内容相匹配。所以你需要使用 =~
而不是 =
- 正如你注释掉的那一行。
$line =~ /($team_regex .*)/s ;
但在上面的评论中,您解释说您已将其注释掉,因为:
The commented line does not cut any characters out - it prints the whole ine
当然它会这样做,因为您没有编写任何代码来以任何方式更改 $line
。但是你想要的是在比赛后的 </code> 中,所以打印出来吧。</p>
<pre><code>$line =~ /($team_regex .*)/s ;
print ;
但是像 </code> 这样的正则表达式变量只会在成功匹配时设置,所以在打印出来之前检查匹配是否有效很重要。您可以通过将匹配运算符放在 <code>if
语句中来做到这一点。
if ($line =~ /($team_regex .*)/s) {
print ;
}
更新: 哦,这不起作用,因为您数据中的团队代码后跟一个冒号,而不是 space(正如您的正则表达式假设的那样).所以把它改成这样:
if ($line =~ /($team_regex:.*)/s) {
print ;
}
请参阅以下代码片段以了解如何实现预期结果
我认为团队的正则表达式应该以不同的方式形成。跳过所有不匹配正则表达式的记录。然后将前 5 数据列 替换为空并打印结果。
use strict;
use warnings;
use feature 'say';
my @team = ("AP","II","DS","WJ", "JK","LC","BJ");
my $re_team = join ': |', @team;
my $filename = shift;
open(my $fh, '<', $filename)
or die "Could not open file $filename $!";
while( <$fh> ) {
chomp;
next unless /$re_team/;
s/^(\S+ ){5}//;
say;
}
close $fh;
输入数据
30th Mar 2020 5:53:18 pm Charlie Brown: BJ: Bloomberg Runs
30th Mar 2020 5:53:27 pm Charlie Brown: DS: ICE DATA = INC1018483661
30th Mar 2020 6:42:43 pm Boris Yeltsin: Cortese's ICE logs is for the Bloomberg Runs issue
30th Mar 2020 6:43:28 pm Charlie Brown: yeap
31st Mar 2020 4:11:22 am Ishtar Johnson: VK : RE: XS2018777099 & XS2018777172 - INC1018491954
31st Mar 2020 6:31:17 am Tommy Boy: NW: RE: SABSM 6.125 YTW - INC1018495843
31st Mar 2020 7:26:40 am Tommy Boy: AP: RE: Rolling 7yrs - INC1018497102
31st Mar 2020 7:45:36 am Tommy Boy: JK: RE: Chris White books - INC1018497380
输出
Charlie Brown: BJ: Bloomberg Runs
Charlie Brown: DS: ICE DATA = INC1018483661
Tommy Boy: AP: RE: Rolling 7yrs - INC1018497102
Tommy Boy: JK: RE: Chris White books - INC1018497380
将s/^(\S+ ){5}//;
替换为s/^(\S+ ){7}//;
得到以下输出
BJ: Bloomberg Runs
DS: ICE DATA = INC1018483661
AP: RE: Rolling 7yrs - INC1018497102
JK: RE: Chris White books - INC1018497380
当然代码可以写成
use strict;
use warnings;
use feature 'say';
my @team = ("AP","II","DS","WJ", "JK","LC","BJ");
my $re_team = join ': |', @team;
my $filename = shift;
open(my $fh, '<', $filename)
or die "Could not open file $filename $!";
/($re_team)/ && say /(.*)/ while <$fh>;
close $fh
甚至这样
use strict;
use warnings;
use feature 'say';
my @team = ("AP","II","DS","WJ", "JK","LC","BJ");
my $re_team = join ': |', @team;
/($re_team)/ && say /(.*)/ while <>;
输出
BJ: Bloomberg Runs
DS: ICE DATA = INC1018483661
AP: RE: Rolling 7yrs - INC1018497102
JK: RE: Chris White books - INC1018497380
如果需要采集数据
use strict;
use warnings;
use feature 'say';
use Data::Dumper;
my @team = ("AP","II","DS","WJ", "JK","LC","BJ");
my $re_team = join ': |', @team;
my @data;
/($re_team)/ && push @data,/(.*)/ while <>;
say Dumper(\@data);
输出
$VAR1 = [
'BJ: Bloomberg Runs',
'DS: ICE DATA = INC1018483661',
'AP: RE: Rolling 7yrs - INC1018497102',
'JK: RE: Chris White books - INC1018497380'
];
我有这个数据,我想在其中删除日期,并打印从首字母到结尾的所有内容。 我映射了首字母。
30th Mar 2020 5:53:18 pm Charlie Brown: BJ: Bloomberg Runs
30th Mar 2020 5:53:27 pm Charlie Brown: DS: ICE DATA = INC1018483661
30th Mar 2020 6:42:43 pm Boris Yeltsin: Cortese's ICE logs is for the Bloomberg Runs issue
30th Mar 2020 6:43:28 pm Charlie Brown: yeap
31st Mar 2020 4:11:22 am Ishtar Johnson: VK : RE: XS2018777099 & XS2018777172 - INC1018491954
31st Mar 2020 6:31:17 am Tommy Boy: NW: RE: SABSM 6.125 YTW - INC1018495843
31st Mar 2020 7:26:40 am Tommy Boy: AP: RE: Rolling 7yrs - INC1018497102
31st Mar 2020 7:45:36 am Tommy Boy: JK: RE: Chris White books - INC1018497380
这是代码 -
#!/usr/bin/perl
use strict;
use warnings;
my @team = ("AP","II","DS","WJ", "JK","LC","BJ") ;
my ( $team_regex ) = map {qr /$_/} join "|", map {quotemeta} @team;
my @orderdTeam ;
my $filename = shift @ARGV ;
open(my $fh, '<', $filename) or die "Could not open file $filename $!";
while (my $line = <$fh> ) {
#$line =~ /($team_regex .*)/s ;
$line = /($team_regex .*)/s ;
print "$line\n";
}
close $fh;
出于某种原因,我收到了这些未初始化的错误。
johnswal@NYKPWM2037968 ~
$ ./cut_date_symphony.pl fooberry
Use of uninitialized value $_ in pattern match (m//) at ./cut_date_symphony.pl line 14, <$fh> line 1.
Use of uninitialized value $_ in pattern match (m//) at ./cut_date_symphony.pl line 14, <$fh> line 2.
Use of uninitialized value $_ in pattern match (m//) at ./cut_date_symphony.pl line 14, <$fh> line 3.
Use of uninitialized value $_ in pattern match (m//) at ./cut_date_symphony.pl line 14, <$fh> line 4.
Use of uninitialized value $_ in pattern match (m//) at ./cut_date_symphony.pl line 14, <$fh> line 5.
Use of uninitialized value $_ in pattern match (m//) at ./cut_date_symphony.pl line 14, <$fh> line 6.
Use of uninitialized value $_ in pattern match (m//) at ./cut_date_symphony.pl line 14, <$fh> line 7.
Use of uninitialized value $_ in pattern match (m//) at ./cut_date_symphony.pl line 14, <$fh> line 8.
注释行只打印出整行——它没有删除日期或时间
#$line =~ /($team_regex .*)/s ;
所以这就是我要找的。 "Tommy Boy NW:" 和 "Ishtar Johnson VK:" 是我们团队的一员,但来自欧洲。只会显示地图数组“@team_regex”门票中的美国队成员。 并且时间和日期将被删除。
BJ: Bloomberg Runs
DS: ICE DATA = INC1018483661
AP: RE: Rolling 7yrs - INC1018497102
JK: RE: Chris White books - INC1018497380
第 14 行是这一行:
$line = /($team_regex .*)/s ;
匹配运算符 (/.../
) 使用 =~
运算符作用于绑定到它的变量,如果没有给出这样的变量,则作用于 $_
。您不使用 =~
,因此匹配运算符会尝试匹配 $
。 $_
不包含任何数据,因此 Perl 会向您显示 "undefined value" 警告。
我想你想将正则表达式与 $line
的内容相匹配。所以你需要使用 =~
而不是 =
- 正如你注释掉的那一行。
$line =~ /($team_regex .*)/s ;
但在上面的评论中,您解释说您已将其注释掉,因为:
The commented line does not cut any characters out - it prints the whole ine
当然它会这样做,因为您没有编写任何代码来以任何方式更改 $line
。但是你想要的是在比赛后的 </code> 中,所以打印出来吧。</p>
<pre><code>$line =~ /($team_regex .*)/s ;
print ;
但是像 </code> 这样的正则表达式变量只会在成功匹配时设置,所以在打印出来之前检查匹配是否有效很重要。您可以通过将匹配运算符放在 <code>if
语句中来做到这一点。
if ($line =~ /($team_regex .*)/s) {
print ;
}
更新: 哦,这不起作用,因为您数据中的团队代码后跟一个冒号,而不是 space(正如您的正则表达式假设的那样).所以把它改成这样:
if ($line =~ /($team_regex:.*)/s) {
print ;
}
请参阅以下代码片段以了解如何实现预期结果
我认为团队的正则表达式应该以不同的方式形成。跳过所有不匹配正则表达式的记录。然后将前 5 数据列 替换为空并打印结果。
use strict;
use warnings;
use feature 'say';
my @team = ("AP","II","DS","WJ", "JK","LC","BJ");
my $re_team = join ': |', @team;
my $filename = shift;
open(my $fh, '<', $filename)
or die "Could not open file $filename $!";
while( <$fh> ) {
chomp;
next unless /$re_team/;
s/^(\S+ ){5}//;
say;
}
close $fh;
输入数据
30th Mar 2020 5:53:18 pm Charlie Brown: BJ: Bloomberg Runs
30th Mar 2020 5:53:27 pm Charlie Brown: DS: ICE DATA = INC1018483661
30th Mar 2020 6:42:43 pm Boris Yeltsin: Cortese's ICE logs is for the Bloomberg Runs issue
30th Mar 2020 6:43:28 pm Charlie Brown: yeap
31st Mar 2020 4:11:22 am Ishtar Johnson: VK : RE: XS2018777099 & XS2018777172 - INC1018491954
31st Mar 2020 6:31:17 am Tommy Boy: NW: RE: SABSM 6.125 YTW - INC1018495843
31st Mar 2020 7:26:40 am Tommy Boy: AP: RE: Rolling 7yrs - INC1018497102
31st Mar 2020 7:45:36 am Tommy Boy: JK: RE: Chris White books - INC1018497380
输出
Charlie Brown: BJ: Bloomberg Runs
Charlie Brown: DS: ICE DATA = INC1018483661
Tommy Boy: AP: RE: Rolling 7yrs - INC1018497102
Tommy Boy: JK: RE: Chris White books - INC1018497380
将s/^(\S+ ){5}//;
替换为s/^(\S+ ){7}//;
得到以下输出
BJ: Bloomberg Runs
DS: ICE DATA = INC1018483661
AP: RE: Rolling 7yrs - INC1018497102
JK: RE: Chris White books - INC1018497380
当然代码可以写成
use strict;
use warnings;
use feature 'say';
my @team = ("AP","II","DS","WJ", "JK","LC","BJ");
my $re_team = join ': |', @team;
my $filename = shift;
open(my $fh, '<', $filename)
or die "Could not open file $filename $!";
/($re_team)/ && say /(.*)/ while <$fh>;
close $fh
甚至这样
use strict;
use warnings;
use feature 'say';
my @team = ("AP","II","DS","WJ", "JK","LC","BJ");
my $re_team = join ': |', @team;
/($re_team)/ && say /(.*)/ while <>;
输出
BJ: Bloomberg Runs
DS: ICE DATA = INC1018483661
AP: RE: Rolling 7yrs - INC1018497102
JK: RE: Chris White books - INC1018497380
如果需要采集数据
use strict;
use warnings;
use feature 'say';
use Data::Dumper;
my @team = ("AP","II","DS","WJ", "JK","LC","BJ");
my $re_team = join ': |', @team;
my @data;
/($re_team)/ && push @data,/(.*)/ while <>;
say Dumper(\@data);
输出
$VAR1 = [
'BJ: Bloomberg Runs',
'DS: ICE DATA = INC1018483661',
'AP: RE: Rolling 7yrs - INC1018497102',
'JK: RE: Chris White books - INC1018497380'
];