从文本文件中解析字符串

Question

我正在尝试使用 Perl 脚本解析以下文本文件 (test.txt) 以获得底部提到的输出格式（bugid、描述和用户名）。你能帮我实现吗？

test.txt

中的数据

(1111) user1 <user1@mail.com> 112111: description1 - some dummy string
(6473) user2 <user2@mail.com> 112112: description2 - some test string
(1999) user3 <user3@mail.com> 129119: description3 - some tes3 string
(3975) user3 <user3@mail.com> 196234: description4 - some tes4 string

这是我正在尝试的脚本。

#!perl -w
#use strict;

no warnings;

my $ActivityLog = "test.txt";
my $ActListLog = "test3.txt";

open(FILE, "<$ActivityLog");
            @prelist = <FILE>;
            close (FILE);
            
            foreach (@prelist)
            {
                if ($_ !~ /Bring over:/)
                {
                    @postlist = split(".com> ", $_);    
                    push (@result, $postlist[1]);
                }
            }   
            unlink $ActListLog;
            open(LISTNAME,">$ActListLog")||die("cannot open the Input file");
            print LISTNAME @result;
            close LISTNAME;

所需输出：

112111: description1 user1
112112: description2 user2
129119: description3 user3
196234: description4 user3

Answer 1

如果用户和描述在您的示例数据中没有空格：

#!/usr/bin/perl

use strict;
use warnings;

my $ActivityLog = 'test.txt';

open my $fh,'<', $ActivityLog or die "$ActivityLog: $!";
while(<$fh>) {
    print "  \n" if(/^\S+ (\S+) \S+ (\d+:) (\S+)/);
    #                           user      number description
}
close $fh;

如果用户和数据可以有空格：

#!/usr/bin/perl

use strict;
use warnings;

my $ActivityLog = 'test.txt';

open my $fh,'<', $ActivityLog or die "$ActivityLog: $!";
while(<$fh>) {
    print "  \n" if(/^\(\d+\)\s+(.*)\s+<\S+?>\s+(\d+:)\s+(.*?)\s+-/);
    #                                 user            number   description
}
close $fh;

Answer 2

您可以 split 空白，只保留您需要的 3 项。然后按需要的顺序打印出来：

use warnings;
use strict;

while (<DATA>) {
    my (undef, $user, undef, $id, $desc) = split;
    print "$id $desc $user\n";
}

__DATA__
(1111) user1 <user1@mail.com> 112111: description1 - some dummy string
(6473) user2 <user2@mail.com> 112112: description2 - some test string
(1999) user3 <user3@mail.com> 129119: description3 - some tes3 string
(3975) user3 <user3@mail.com> 196234: description4 - some tes4 string

打印：

112111: description1 user1
112112: description2 user2
129119: description3 user3
196234: description4 user3

Answer 3

使用这个 Perl 单行代码：

perl -lne '( $username, $bugid, $description ) = m{ < (\S+) @ \S+ \s+ (\S+:) \s+ (\S+) }xms; print join " ", $bugid,  $description, $username;' test.txt > test3.txt

Perl 单行代码使用这些命令行标志：
-e : 告诉 Perl 查找内联代码，而不是在文件中。
-n ：一次循环输入一行，默认分配给 $_。
-l : 在执行内联代码之前去除输入行分隔符（默认情况下在 *NIX 上为 "\n" ），并在打印时附加它。

正则表达式使用这个修饰符：
/x : 为了便于阅读，忽略空格和注释。

另见：
perldoc perlrun: how to execute the Perl interpreter: command line switches
perldoc perlre: Perl regular expressions (regexes)
perldoc perlre: Perl regular expressions (regexes): Quantifiers; Character Classes and other Special Escapes; Assertions; Capture groups
perldoc perlrequick: Perl regular expressions quick start

从文本文件中解析字符串

Parsing strings from text file

perl