在 Perl 中使用多个反向引用

Question

我正在尝试在 Perl 中使用多个反向引用来匹配 5 种不同的模式，但除了第一个之外我没有找到任何匹配项。

我尝试了以下方法：

my $string = ">abc|XYUXYU|KIOKEIK_7XNCU Happy, not-happy apple banana X ORIG=Came from trees NBMR 12345 OZ=1213379 NG=popZ AZ=2 BU=1";
$string =~ m/>(abc)|(.*)|.*ORIG=(.*)[A-Z].*NG=(.*)\s(.*)\s/;

print "First match should be 'abc'. We got: \n";
print "Second match should be 'XYUXYU'. We got: \n";
print "Third match should be 'Came from trees'. We got: \n";
print "Fourth match should be 'popZ'. We got: \n";
print "Fifth match should be 'AZ=2'. We got: \n";

我想要作为输出：

First match should be 'abc'. We got: abc
Second match should be 'XYUXYU'. We got: XYUXYU
Third match should be 'Came from trees'. We got: Came from trees
Fourth match should be 'popZ'. We got: popZ
Fifth match should be 'AZ=2'. We got: AZ=2

知道如何在 Perl 上以正确的方式解决这个问题吗？

Answer 1

您必须通过在前面加上 \ 来转义 |，否则它们意味着交替（a|b 匹配 a 或 b）。对于你的第三场比赛，你必须通过附加 ? 使量词 * 非贪婪。并且您需要稍微调整第三个捕获组之后的模式以匹配 space 至少一个大写字符（这里不完全清楚总体可能性是什么，因为您只是给出了一个没有更多细节的例子。它可能需要进一步调整。）

#!/usr/bin/perl

use strict;
use warnings;

my $string = ">abc|XYUXYU|KIOKEIK_7XNCU Happy, not-happy apple banana X ORIG=Came from trees NBMR 12345 OZ=1213379 NG=popZ AZ=2 BU=1";
$string =~ m/>(abc)\|(.*)\|.*ORIG=(.*?)\s[A-Z]+.*NG=(.*)\s(.*)\s/;

print "First match should be 'abc'. We got: \n";
print "Second match should be 'XYUXYU'. We got: \n";
print "Third match should be 'Came from trees'. We got: \n";
print "Fourth match should be 'popZ'. We got: \n";
print "Fifth match should be 'AZ=2'. We got: \n";

输出：

First match should be 'abc'. We got: abc
Second match should be 'XYUXYU'. We got: XYUXYU
Third match should be 'Came from trees'. We got: Came from trees
Fourth match should be 'popZ'. We got: popZ
Fifth match should be 'AZ=2'. We got: AZ=2

在 Perl 中使用多个反向引用

Using multiple backreference in Perl

regex

perl

backreference