在 Perl 的正则表达式中匹配命名捕获组的位置

Match positions of named capture groups in Perl's regexps

在 Perl 中,命名捕获组可用于使用正则表达式从字符串中提取数据:

perl -wle '
    use Data::Dumper;
    "abc" =~ / (?<B> (?<A> a ) b ) c /x and print "match!";
    print Dumper(\%+);
'

打印

match!                                                                                                                            
$VAR1 = {                                                                                                                         
      'B' => 'ab',                                                                                                            
      'A' => 'a'                                                                                                              
    };                                                                                                                        

但是如何获取字符串 "abc" 中匹配项 A 和 B 的 位置 ?当使用 un 命名捕获组时,可以引用正则表达式变量 @-@+,但这不适用于命名组 (*)。

(*) With 'does not work' 我的意思是我不能使用捕获组的名称来检索位置,而只能使用组的编号(例如 $-[1] 的起始位置A 组,但不是 $START_POS{A} 之类的东西)。这减少了命名捕获组的使用,如果事先不知道捕获组的顺序,甚至可能无法使用。

one can refer to the regex variables @- and @+, but this does not work for named groups.

让我们首先确定 @+@- 按预期工作:

perl -wle '
    use Data::Dumper;
    "abc" =~ / (?<B> (?<A> a ) b ) c /x and print "match!";
    print Dumper(\@+);'
match!
$VAR1 = [
          3,
          2,
          1
        ];

perl -wle '
    use Data::Dumper;
    "abc" =~ / (?<B> (?<A> a ) b ) c /x and print "match!";
    print Dumper(\@-);'
match!
$VAR1 = [
          0,
          0,
          0
        ];

现在,自从我发布了上面的内容后,您通过说

扩展了您的问题

(*) With 'does not work' I mean that I cannot use the name of the capture group to retrieve the position, but only the number of the group (e.g. $-[1] for the start position of group A, but not something like $START_POS{A}). This redicules the use of named capture groups, and may not even be possible if one does not know the order of the capture groups in advance. (emphasis mine)

我不太明白你的意思或你为什么需要这个,但我反复的询问仍然没有得到答复,所以这里是你的字面问题的答案。

查看 perldoc perlvar,我们注意到目前没有机制可以让您通过捕获组的名称查找匹配位置的另一个哈希。

%LAST_PAREN_MATCH %+

Similar to @+, the %+ hash allows access to the named capture buffers, should they exist, in the last successful match in the currently active dynamic scope.

For example, $+{foo} is equivalent to after the following match:

   'foo' =~ /(?<foo>foo)/;

The keys of the %+ hash list only the names of buffers that have captured (and that are thus associated to defined values).

The underlying behaviour of %+ is provided by the Tie::Hash::NamedCapture module. … This variable was added in Perl v5.10.0. This variable is read-only and dynamically-scoped.

在匹配后构建这样的查找 table 实际上相当简单,但正如我之前提到的,我不确定你为什么需要它,而且它似乎不是解决任何问题的最佳解决方案我以前遇到过的问题。对于您的问题,它可能不是最合适table的解决方案,因此您最好解释一下您实际要解决的问题。

#!/usr/bin/env perl

use strict;
use warnings;
use Data::Dumper;

my $str = 'abc';

$str =~ / (?<B> (?<A> a) b) c /x;

my %captured_to_pos = map +(substr($str, $-[$_], $+[$_] - $-[$_]) => [$-[$_], $+[$_]]), 0 .. $#+;

print Dumper $captured_to_pos{$+{$_}} for qw( A B );

输出:

$VAR1 = [
          0,
          1
        ];
$VAR1 = [
          0,
          2
        ];

如果不同的捕获组可以匹配相同的字符串,您必须更加小心,但如果您没有充分的激励解释,我认为没有理由深入研究这一点。