如何更正设置正则表达式以用 perl 替换多行变量占位符

Howto correct set the regex to substitute multiline variable placeholder with perl

我正在使用 Perl 5.20 为数据转换工具编写复杂的配置文件。

配置文件在加载时和运行时有几个方面的占位符封装了一些path code比如

# Load time placeholder example
CONFIG: NAME ${/path/pos/*/test/*[123] == 'ABC' }

# Runtime placeholder example
COLUMN: CSV_NAME STRING DEFAULT :{./CSV_FIRST} 

出于某种原因,它应该也适用于多行表达式。

我使用 Text::ParseWord 标准分隔符 \s+ 编写了扫描仪,并希望在通过 base64 编码将数据行拆分为单个单词之前转义占位符表达式不包含 \s+ 的表达式。表达式也是后续数据替换的key

转义是由定义的模式匹配驱动的(...见下面的代码):

 my @pat = $line =~ /([^\]\Q$pfx\E\{[^\Q$pfx\E\{\}]+\})/gs;

例如,当我使用 $pfx = '$' 时,哪个 IMO 定义了多行模式 ${...},但是屏蔽(转义)${...} 表达式。

问题

我在模式 ... 的内部挣扎了一段时间并让 [^\Q$pfx\E\{\}]+ 工作,但感觉不正确,因为

  1. 只包含不使用的符号集,
  2. 但不是外层的序列,
  3. 例如防止嵌套表达式。

正确的表达方式是什么?

测试例程

#!/usr/bin/env perl
use strict;
use warnings;
use MIME::Base64;

use feature qw(signatures);
no warnings 'once';
no warnings 'experimental';
no warnings 'experimental::signatures';

my $line =
'# test data
 This are:
 1. ${/multiline/used/*[3]
      = "12345"}
 2. ${/single/line/compile/time/pattern/*[3]}
 3. ${/single/line/runtime/pattern/x == 1234}
 4. ${/multi/line/runtime/pattern \
      defer/1 \
      defer/2 \
      defer/3
     }
 5. ${//PG.GRM/*[
        key eq "TEST.VAR"
       ]}
';

sub testSpacedPlaceHolder($pfx, $line) {
    my %match;
    
    my @pat = $line =~ /([^\]\Q$pfx\E\{[^\Q$pfx\E\{\}]+\})/gs;
    my %seen = ();
    my @uniq = grep { ! $seen{$_} ++ } @pat;    
    for my $key (@uniq) {
        my $hkey=$pfx.encode_base64($key);
        $hkey =~ s/\n//g;
        my $var = substr($key, 3, -1);
        $match{$hkey}= [ $var, $key ];
        $line =~ s/\Q$key\E/$hkey/g;
    }
    # Test the output ------------------------------
    my $cnt = 0;
    print "\nRESULT:\n";
    for my $key (sort keys %match) {
        $cnt++;
        my ($var, $orig) = @ { $match{$key} };
        print "---- $cnt ----\n";
        print "ORG: $orig\n";
        print "VAR: $var\n";
        print "ESC: $key\n";

    }
    print "\nLINE:\n$line\n";
    return ($line, \%match);
}

testSpacedPlaceHolder('$', $line);

结果

/usr/bin/env perl "test-strings.pl"

RESULT:
---- 1 ----
ORG:  ${/multi/line/runtime/pattern \
      defer/1 \
      defer/2 \
      defer/3
     }
VAR: /multi/line/runtime/pattern \
      defer/1 \
      defer/2 \
      defer/3
     
ESC: $ICR7L211bHRpL2xpbmUvcnVudGltZS9wYXR0ZXJuIFwKICAgICAgZGVmZXIvMSBcCiAgICAgIGRlZmVyLzIgXAogICAgICBkZWZlci8zCiAgICAgfQ==
---- 2 ----
ORG:  ${/multiline/used/*[3]
      = "12345"}
VAR: /multiline/used/*[3]
      = "12345"
ESC: $ICR7L211bHRpbGluZS91c2VkLypbM10KICAgICAgPSAiMTIzNDUifQ==
---- 3 ----
ORG:  ${/single/line/compile/time/pattern/*[3]}
VAR: /single/line/compile/time/pattern/*[3]
ESC: $ICR7L3NpbmdsZS9saW5lL2NvbXBpbGUvdGltZS9wYXR0ZXJuLypbM119
---- 4 ----
ORG:  ${/single/line/runtime/pattern/x == 1234}
VAR: /single/line/runtime/pattern/x == 1234
ESC: $ICR7L3NpbmdsZS9saW5lL3J1bnRpbWUvcGF0dGVybi94ID09IDEyMzR9
---- 5 ----
ORG:  ${//PG.GRM/*[
        key eq "TEST.VAR"
       ]}
VAR: //PG.GRM/*[
        key eq "TEST.VAR"
       ]
ESC: $ICR7Ly9QRy5HUk0vKlsKICAgICAgICBrZXkgZXEgIlRFU1QuVkFSIgogICAgICAgXX0=

LINE:
# test data
 This are:
 1.$ICR7L211bHRpbGluZS91c2VkLypbM10KICAgICAgPSAiMTIzNDUifQ==
 2.$ICR7L3NpbmdsZS9saW5lL2NvbXBpbGUvdGltZS9wYXR0ZXJuLypbM119
 3.$ICR7L3NpbmdsZS9saW5lL3J1bnRpbWUvcGF0dGVybi94ID09IDEyMzR9
 4.$ICR7L211bHRpL2xpbmUvcnVudGltZS9wYXR0ZXJuIFwKICAgICAgZGVmZXIvMSBcCiAgICAgIGRlZmVyLzIgXAogICAgICBkZWZlci8zCiAgICAgfQ==
 5.$ICR7Ly9QRy5HUk0vKlsKICAgICAgICBrZXkgZXEgIlRFU1QuVkFSIgogICAgICAgXX0=

编辑

假设我有一个定义某种配置的脚本:

MAGIC: MAGIC.TYPE

CONTAINER: NAME BEGIN

DEFINE: VAR1 'USER.NAME'
DEFINE: VAR2 '65789'

INTERNAL.CONTAINER: INTERNAL.NAME BEGIN

    TAG1: 'ABCDEF'
    TAG2: ${/NAME/VAR1}

    # Unwanted nested variant 
    TAG3: ${/NAME/VAR1 ${/NAME/VAR2} }

    # Valid runtime interpolation variant
    TAG4: "${/NAME/VAR1}/:{NAME.KEY}"
    
    # Valid runtime path variant but ignored
    TAG5: ${/NAME/VAR1/*/:{TEST{KEY}}

END.INTERNAL.NAME
 
END.NAME 

我想避开嵌套行

    # Nested variant 
    TAG3: ${/NAME/VAR1 ${/NAME/VAR2}}

出于可变解析原因,但保留

    # Valid runtime path variant but ignored
    TAG5: ${/NAME/VAR2/*/:{TEST{KEY}}

因为它们是运行时驱动的。

由于序列简单 [$\{\}]+,我的变体阻止了 TAG5。

这是一个示例,说明如何使用递归正则表达式来排除 ${...} 的嵌套版本:

use feature qw(say);
use strict;
use warnings;
use Data::Dumper qw(Dumper);
my $str = <<'END_STR';
MAGIC: MAGIC.TYPE

CONTAINER: NAME BEGIN

DEFINE: VAR1 'USER.NAME'
DEFINE: VAR2 '65789'

INTERNAL.CONTAINER: INTERNAL.NAME BEGIN

    TAG1: 'ABCDEF'
    TAG2: ${/NAME/VAR1}

    # Unwanted nested variant
    TAG3: ${/NAME/VAR1 ${/NAME/VAR2} }

    # Valid runtime interpolation variant
    TAG4: "${/NAME/VAR1}/:{NAME.KEY}"

    # Valid runtime path variant but ignored
    TAG5: ${/NAME/VAR1/*/:{TEST{KEY}}}

END.INTERNAL.NAME

END.NAME
END_STR

my @matches;
while ($str =~ /(?:^|(?<!\))
                   (?<G3>(?<G1> $ \{ (?:
                       (?>(?:[^{}$\] | (?:\.) |
                         (?<G2> \{ (?: (?>[^{}]+) | (?&G2))* \} ))
                           | (?<G4>(?&G1))))* \}))/msxg) {
    next if defined $+{G4}; # skip nested matches
    push @matches, $+{G3};
}
print Dumper(\@matches);

输出:

$VAR1 = [
          '${/NAME/VAR1}',
          '${/NAME/VAR1}',
          '${/NAME/VAR1/*/:{TEST{KEY}}}'
        ];

请注意,结果包括 TAG2、TAG4 和 TAG5,但不包括 TAG3。