正则表达式有效，但收到警告：在正则表达式错误中多次匹配空字符串

Question

我有一个字符串，其中包含许多我需要提取的组件。它们结构良好且可预测，但它们出现的顺序各不相同。下面是一个片段，它说明了字符串的外观以及我用来提取所需信息的正则表达式。此代码有效，我得到了预期的输出。

my $str1 = '(test1=cat)(test2=dog)(test3=mouse)';         # prints cat\ndog\mouse
$str1 = '(test1=cat)(test3=mouse)(test2=dog)(test1=cat)'; # prints cat\ndog\nmouse
$str1 = '(test3=mouse)(test1=cat)';                       # prints cat\nempty\nmouse
$str1 = '(test3=mouse)(test2=dog)';                       # prints empty\ndog\nmouse
my $pattern1 = '(?=.*\(test1=(.*?)\))*(?=.*\(test2=(.*?)\))*(?=.*\(test3=(.*?)\))*';

if (my @map = $str1 =~ /$pattern1/) {
    foreach my $match (@map) {
        say $match if $match;
        say "empty" if !$match;
    }
}

上面最后一个字符串的预期和接收结果如下：

empty
dog
mouse

但是，除了预期的响应之外，还有以下警告：

(?=.*\(test1=(.*?)\))* matches null string many times in regex; marked by <-- HERE in m/(?=.*\(test1=(.*?)\))* <-- HERE (?=.*\(test2=(.*?)\))*(?=.*\(test3=(.*?)\))*/ at /path/to/scratch1.pl line 32.
(?=.*\(test2=(.*?)\))* matches null string many times in regex; marked by <-- HERE in m/(?=.*\(test1=(.*?)\))*(?=.*\(test2=(.*?)\))* <-- HERE (?=.*\(test3=(.*?)\))*/ at /path/to/scratch1.pl line 32.
(?=.*\(test3=(.*?)\))* matches null string many times in regex; marked by <-- HERE in m/(?=.*\(test1=(.*?)\))*(?=.*\(test2=(.*?)\))*(?=.*\(test3=(.*?)\))* <-- HERE / at /path/to/scratch1.pl line 32.

这告诉我虽然我的正则表达式可以工作，但它可能有一些问题。

如何调整上述正则表达式以继续按预期工作，同时消除警告？

以下是我必须处理的一些限制条件：

必须保持结果的顺序（例如，"test1" 将始终是数组的第一个元素）
字段名称并不是真正的 "testN"，有许多我必须使用的唯一名称，这些是静态值
可以重复，但应该使用最后一个（上面的脚本就是这样做的）

我通常不使用环顾四周，所以我的错误可能是基本的（希望如此）。非常感谢任何建议或反馈。谢谢！

编辑 - 运行 Perl 5.20

Answer 1

多次匹配 look-ahead (?=...) 没有意义。它不消耗 object 字符串中的任何数据，因此如果它匹配一次，它将无限期地匹配

您需要做的主要更改是将 (?=.*\(test1=(.*?)\))* 等替换为 (?=.*\(test1=(.*?)\))?。这只会让你的 look-ahead "optional"，并且会消除你的警告

use strict;
use warnings 'all';

use Data::Dump;

my $pattern = qr/
    (?= .* \( test1= (.*?) \) )?
    (?= .* \( test2= (.*?) \) )?
    (?= .* \( test3= (.*?) \) )?
/x;

my @strings = qw/
    (test1=cat)(test2=dog)(test3=mouse)
    (test1=cat)(test3=mouse)(test2=dog)(test1=cat)
    (test3=mouse)(test1=cat)
    (test3=mouse)(test2=dog)
/;

for my $str ( @strings ) {

    next unless my @map = $str =~ /$pattern/;

    $_ //= 'empty' for @map;

    dd \@map;
}

输出

["cat", "dog", "mouse"]
["cat", "dog", "mouse"]
["cat", "empty", "mouse"]
["empty", "dog", "mouse"]

但是，这听起来像是另一种让单个正则表达式模式做太多工作的情况。您正在用 Perl 编写，为什么不使用它呢？

以下代码假定与上面的完整程序相同 header，直至并包括 @strings 的定义。 for 循环是我更改的全部内容

for my $str ( @strings ) {
    my @map = map {  $str =~ / \( test$_= ( [^()]* ) \)/x ?  : 'empty' } 1 .. 3;
    dd \@map;
}

输出

["cat", "dog", "mouse"]
["cat", "dog", "mouse"]
["cat", "empty", "mouse"]
["empty", "dog", "mouse"]

或者可能是不同的东西是合适的。哈希对这类事情很有用

for my $str ( @strings ) {
    my %map = $str =~ / \( ( test\d+ ) = ( [^()]* ) \) /gx; 
    dd \%map;
}

输出

{ test1 => "cat", test2 => "dog", test3 => "mouse" }
{ test1 => "cat", test2 => "dog", test3 => "mouse" }
{ test1 => "cat", test3 => "mouse" }
{ test2 => "dog", test3 => "mouse" }

正则表达式有效，但收到警告：在正则表达式错误中多次匹配空字符串

Regex works, but receive warning: matches null string many times in regex errors

regex

perl

regex-lookarounds

输出

输出

输出