从文本文件中解析字符串
Parsing strings from text file
我正在尝试使用 Perl 脚本解析以下文本文件 (test.txt) 以获得底部提到的输出格式(bugid、描述和用户名)。你能帮我实现吗?
test.txt
中的数据
(1111) user1 <user1@mail.com> 112111: description1 - some dummy string
(6473) user2 <user2@mail.com> 112112: description2 - some test string
(1999) user3 <user3@mail.com> 129119: description3 - some tes3 string
(3975) user3 <user3@mail.com> 196234: description4 - some tes4 string
这是我正在尝试的脚本。
#!perl -w
#use strict;
no warnings;
my $ActivityLog = "test.txt";
my $ActListLog = "test3.txt";
open(FILE, "<$ActivityLog");
@prelist = <FILE>;
close (FILE);
foreach (@prelist)
{
if ($_ !~ /Bring over:/)
{
@postlist = split(".com> ", $_);
push (@result, $postlist[1]);
}
}
unlink $ActListLog;
open(LISTNAME,">$ActListLog")||die("cannot open the Input file");
print LISTNAME @result;
close LISTNAME;
所需输出:
112111: description1 user1
112112: description2 user2
129119: description3 user3
196234: description4 user3
如果用户和描述在您的示例数据中没有空格:
#!/usr/bin/perl
use strict;
use warnings;
my $ActivityLog = 'test.txt';
open my $fh,'<', $ActivityLog or die "$ActivityLog: $!";
while(<$fh>) {
print " \n" if(/^\S+ (\S+) \S+ (\d+:) (\S+)/);
# user number description
}
close $fh;
如果用户和数据可以有空格:
#!/usr/bin/perl
use strict;
use warnings;
my $ActivityLog = 'test.txt';
open my $fh,'<', $ActivityLog or die "$ActivityLog: $!";
while(<$fh>) {
print " \n" if(/^\(\d+\)\s+(.*)\s+<\S+?>\s+(\d+:)\s+(.*?)\s+-/);
# user number description
}
close $fh;
您可以 split 空白,只保留您需要的 3 项。然后按需要的顺序打印出来:
use warnings;
use strict;
while (<DATA>) {
my (undef, $user, undef, $id, $desc) = split;
print "$id $desc $user\n";
}
__DATA__
(1111) user1 <user1@mail.com> 112111: description1 - some dummy string
(6473) user2 <user2@mail.com> 112112: description2 - some test string
(1999) user3 <user3@mail.com> 129119: description3 - some tes3 string
(3975) user3 <user3@mail.com> 196234: description4 - some tes4 string
打印:
112111: description1 user1
112112: description2 user2
129119: description3 user3
196234: description4 user3
使用这个 Perl 单行代码:
perl -lne '( $username, $bugid, $description ) = m{ < (\S+) @ \S+ \s+ (\S+:) \s+ (\S+) }xms; print join " ", $bugid, $description, $username;' test.txt > test3.txt
Perl 单行代码使用这些命令行标志:
-e
: 告诉 Perl 查找内联代码,而不是在文件中。
-n
:一次循环输入一行,默认分配给 $_
。
-l
: 在执行内联代码之前去除输入行分隔符(默认情况下在 *NIX 上为 "\n"
),并在打印时附加它。
正则表达式使用这个修饰符:
/x
: 为了便于阅读,忽略空格和注释。
另见:
perldoc perlrun
: how to execute the Perl interpreter: command line switches
perldoc perlre
: Perl regular expressions (regexes)
perldoc perlre
: Perl regular expressions (regexes): Quantifiers; Character Classes and other Special Escapes; Assertions; Capture groups
perldoc perlrequick
: Perl regular expressions quick start
我正在尝试使用 Perl 脚本解析以下文本文件 (test.txt) 以获得底部提到的输出格式(bugid、描述和用户名)。你能帮我实现吗?
test.txt
中的数据(1111) user1 <user1@mail.com> 112111: description1 - some dummy string
(6473) user2 <user2@mail.com> 112112: description2 - some test string
(1999) user3 <user3@mail.com> 129119: description3 - some tes3 string
(3975) user3 <user3@mail.com> 196234: description4 - some tes4 string
这是我正在尝试的脚本。
#!perl -w
#use strict;
no warnings;
my $ActivityLog = "test.txt";
my $ActListLog = "test3.txt";
open(FILE, "<$ActivityLog");
@prelist = <FILE>;
close (FILE);
foreach (@prelist)
{
if ($_ !~ /Bring over:/)
{
@postlist = split(".com> ", $_);
push (@result, $postlist[1]);
}
}
unlink $ActListLog;
open(LISTNAME,">$ActListLog")||die("cannot open the Input file");
print LISTNAME @result;
close LISTNAME;
所需输出:
112111: description1 user1
112112: description2 user2
129119: description3 user3
196234: description4 user3
如果用户和描述在您的示例数据中没有空格:
#!/usr/bin/perl
use strict;
use warnings;
my $ActivityLog = 'test.txt';
open my $fh,'<', $ActivityLog or die "$ActivityLog: $!";
while(<$fh>) {
print " \n" if(/^\S+ (\S+) \S+ (\d+:) (\S+)/);
# user number description
}
close $fh;
如果用户和数据可以有空格:
#!/usr/bin/perl
use strict;
use warnings;
my $ActivityLog = 'test.txt';
open my $fh,'<', $ActivityLog or die "$ActivityLog: $!";
while(<$fh>) {
print " \n" if(/^\(\d+\)\s+(.*)\s+<\S+?>\s+(\d+:)\s+(.*?)\s+-/);
# user number description
}
close $fh;
您可以 split 空白,只保留您需要的 3 项。然后按需要的顺序打印出来:
use warnings;
use strict;
while (<DATA>) {
my (undef, $user, undef, $id, $desc) = split;
print "$id $desc $user\n";
}
__DATA__
(1111) user1 <user1@mail.com> 112111: description1 - some dummy string
(6473) user2 <user2@mail.com> 112112: description2 - some test string
(1999) user3 <user3@mail.com> 129119: description3 - some tes3 string
(3975) user3 <user3@mail.com> 196234: description4 - some tes4 string
打印:
112111: description1 user1
112112: description2 user2
129119: description3 user3
196234: description4 user3
使用这个 Perl 单行代码:
perl -lne '( $username, $bugid, $description ) = m{ < (\S+) @ \S+ \s+ (\S+:) \s+ (\S+) }xms; print join " ", $bugid, $description, $username;' test.txt > test3.txt
Perl 单行代码使用这些命令行标志:
-e
: 告诉 Perl 查找内联代码,而不是在文件中。
-n
:一次循环输入一行,默认分配给 $_
。
-l
: 在执行内联代码之前去除输入行分隔符(默认情况下在 *NIX 上为 "\n"
),并在打印时附加它。
正则表达式使用这个修饰符:
/x
: 为了便于阅读,忽略空格和注释。
另见:
perldoc perlrun
: how to execute the Perl interpreter: command line switches
perldoc perlre
: Perl regular expressions (regexes)
perldoc perlre
: Perl regular expressions (regexes): Quantifiers; Character Classes and other Special Escapes; Assertions; Capture groups
perldoc perlrequick
: Perl regular expressions quick start