如何更正设置正则表达式以用 perl 替换多行变量占位符
Howto correct set the regex to substitute multiline variable placeholder with perl
我正在使用 Perl 5.20 为数据转换工具编写复杂的配置文件。
配置文件在加载时和运行时有几个方面的占位符封装了一些path code比如
# Load time placeholder example
CONFIG: NAME ${/path/pos/*/test/*[123] == 'ABC' }
# Runtime placeholder example
COLUMN: CSV_NAME STRING DEFAULT :{./CSV_FIRST}
出于某种原因,它应该也适用于多行表达式。
我使用 Text::ParseWord
标准分隔符 \s+
编写了扫描仪,并希望在通过 base64
编码将数据行拆分为单个单词之前转义占位符表达式不包含 \s+
的表达式。表达式也是后续数据替换的key
转义是由定义的模式匹配驱动的(...见下面的代码):
my @pat = $line =~ /([^\]\Q$pfx\E\{[^\Q$pfx\E\{\}]+\})/gs;
例如,当我使用 $pfx = '$'
时,哪个 IMO 定义了多行模式 ${...}
,但是屏蔽(转义)${...}
表达式。
问题
我在模式 ...
的内部挣扎了一段时间并让 [^\Q$pfx\E\{\}]+
工作,但感觉不正确,因为
- 只包含不使用的符号集,
- 但不是外层的序列,
- 例如防止嵌套表达式。
正确的表达方式是什么?
测试例程
#!/usr/bin/env perl
use strict;
use warnings;
use MIME::Base64;
use feature qw(signatures);
no warnings 'once';
no warnings 'experimental';
no warnings 'experimental::signatures';
my $line =
'# test data
This are:
1. ${/multiline/used/*[3]
= "12345"}
2. ${/single/line/compile/time/pattern/*[3]}
3. ${/single/line/runtime/pattern/x == 1234}
4. ${/multi/line/runtime/pattern \
defer/1 \
defer/2 \
defer/3
}
5. ${//PG.GRM/*[
key eq "TEST.VAR"
]}
';
sub testSpacedPlaceHolder($pfx, $line) {
my %match;
my @pat = $line =~ /([^\]\Q$pfx\E\{[^\Q$pfx\E\{\}]+\})/gs;
my %seen = ();
my @uniq = grep { ! $seen{$_} ++ } @pat;
for my $key (@uniq) {
my $hkey=$pfx.encode_base64($key);
$hkey =~ s/\n//g;
my $var = substr($key, 3, -1);
$match{$hkey}= [ $var, $key ];
$line =~ s/\Q$key\E/$hkey/g;
}
# Test the output ------------------------------
my $cnt = 0;
print "\nRESULT:\n";
for my $key (sort keys %match) {
$cnt++;
my ($var, $orig) = @ { $match{$key} };
print "---- $cnt ----\n";
print "ORG: $orig\n";
print "VAR: $var\n";
print "ESC: $key\n";
}
print "\nLINE:\n$line\n";
return ($line, \%match);
}
testSpacedPlaceHolder('$', $line);
结果
/usr/bin/env perl "test-strings.pl"
RESULT:
---- 1 ----
ORG: ${/multi/line/runtime/pattern \
defer/1 \
defer/2 \
defer/3
}
VAR: /multi/line/runtime/pattern \
defer/1 \
defer/2 \
defer/3
ESC: $ICR7L211bHRpL2xpbmUvcnVudGltZS9wYXR0ZXJuIFwKICAgICAgZGVmZXIvMSBcCiAgICAgIGRlZmVyLzIgXAogICAgICBkZWZlci8zCiAgICAgfQ==
---- 2 ----
ORG: ${/multiline/used/*[3]
= "12345"}
VAR: /multiline/used/*[3]
= "12345"
ESC: $ICR7L211bHRpbGluZS91c2VkLypbM10KICAgICAgPSAiMTIzNDUifQ==
---- 3 ----
ORG: ${/single/line/compile/time/pattern/*[3]}
VAR: /single/line/compile/time/pattern/*[3]
ESC: $ICR7L3NpbmdsZS9saW5lL2NvbXBpbGUvdGltZS9wYXR0ZXJuLypbM119
---- 4 ----
ORG: ${/single/line/runtime/pattern/x == 1234}
VAR: /single/line/runtime/pattern/x == 1234
ESC: $ICR7L3NpbmdsZS9saW5lL3J1bnRpbWUvcGF0dGVybi94ID09IDEyMzR9
---- 5 ----
ORG: ${//PG.GRM/*[
key eq "TEST.VAR"
]}
VAR: //PG.GRM/*[
key eq "TEST.VAR"
]
ESC: $ICR7Ly9QRy5HUk0vKlsKICAgICAgICBrZXkgZXEgIlRFU1QuVkFSIgogICAgICAgXX0=
LINE:
# test data
This are:
1.$ICR7L211bHRpbGluZS91c2VkLypbM10KICAgICAgPSAiMTIzNDUifQ==
2.$ICR7L3NpbmdsZS9saW5lL2NvbXBpbGUvdGltZS9wYXR0ZXJuLypbM119
3.$ICR7L3NpbmdsZS9saW5lL3J1bnRpbWUvcGF0dGVybi94ID09IDEyMzR9
4.$ICR7L211bHRpL2xpbmUvcnVudGltZS9wYXR0ZXJuIFwKICAgICAgZGVmZXIvMSBcCiAgICAgIGRlZmVyLzIgXAogICAgICBkZWZlci8zCiAgICAgfQ==
5.$ICR7Ly9QRy5HUk0vKlsKICAgICAgICBrZXkgZXEgIlRFU1QuVkFSIgogICAgICAgXX0=
编辑
假设我有一个定义某种配置的脚本:
MAGIC: MAGIC.TYPE
CONTAINER: NAME BEGIN
DEFINE: VAR1 'USER.NAME'
DEFINE: VAR2 '65789'
INTERNAL.CONTAINER: INTERNAL.NAME BEGIN
TAG1: 'ABCDEF'
TAG2: ${/NAME/VAR1}
# Unwanted nested variant
TAG3: ${/NAME/VAR1 ${/NAME/VAR2} }
# Valid runtime interpolation variant
TAG4: "${/NAME/VAR1}/:{NAME.KEY}"
# Valid runtime path variant but ignored
TAG5: ${/NAME/VAR1/*/:{TEST{KEY}}
END.INTERNAL.NAME
END.NAME
我想避开嵌套行
# Nested variant
TAG3: ${/NAME/VAR1 ${/NAME/VAR2}}
出于可变解析原因,但保留
# Valid runtime path variant but ignored
TAG5: ${/NAME/VAR2/*/:{TEST{KEY}}
因为它们是运行时驱动的。
由于序列简单 [$\{\}]+
,我的变体阻止了 TAG5。
这是一个示例,说明如何使用递归正则表达式来排除 ${...}
的嵌套版本:
use feature qw(say);
use strict;
use warnings;
use Data::Dumper qw(Dumper);
my $str = <<'END_STR';
MAGIC: MAGIC.TYPE
CONTAINER: NAME BEGIN
DEFINE: VAR1 'USER.NAME'
DEFINE: VAR2 '65789'
INTERNAL.CONTAINER: INTERNAL.NAME BEGIN
TAG1: 'ABCDEF'
TAG2: ${/NAME/VAR1}
# Unwanted nested variant
TAG3: ${/NAME/VAR1 ${/NAME/VAR2} }
# Valid runtime interpolation variant
TAG4: "${/NAME/VAR1}/:{NAME.KEY}"
# Valid runtime path variant but ignored
TAG5: ${/NAME/VAR1/*/:{TEST{KEY}}}
END.INTERNAL.NAME
END.NAME
END_STR
my @matches;
while ($str =~ /(?:^|(?<!\))
(?<G3>(?<G1> $ \{ (?:
(?>(?:[^{}$\] | (?:\.) |
(?<G2> \{ (?: (?>[^{}]+) | (?&G2))* \} ))
| (?<G4>(?&G1))))* \}))/msxg) {
next if defined $+{G4}; # skip nested matches
push @matches, $+{G3};
}
print Dumper(\@matches);
输出:
$VAR1 = [
'${/NAME/VAR1}',
'${/NAME/VAR1}',
'${/NAME/VAR1/*/:{TEST{KEY}}}'
];
请注意,结果包括 TAG2、TAG4 和 TAG5,但不包括 TAG3。
我正在使用 Perl 5.20 为数据转换工具编写复杂的配置文件。
配置文件在加载时和运行时有几个方面的占位符封装了一些path code比如
# Load time placeholder example
CONFIG: NAME ${/path/pos/*/test/*[123] == 'ABC' }
# Runtime placeholder example
COLUMN: CSV_NAME STRING DEFAULT :{./CSV_FIRST}
出于某种原因,它应该也适用于多行表达式。
我使用 Text::ParseWord
标准分隔符 \s+
编写了扫描仪,并希望在通过 base64
编码将数据行拆分为单个单词之前转义占位符表达式不包含 \s+
的表达式。表达式也是后续数据替换的key
转义是由定义的模式匹配驱动的(...见下面的代码):
my @pat = $line =~ /([^\]\Q$pfx\E\{[^\Q$pfx\E\{\}]+\})/gs;
例如,当我使用 $pfx = '$'
时,哪个 IMO 定义了多行模式 ${...}
,但是屏蔽(转义)${...}
表达式。
问题
我在模式 ...
的内部挣扎了一段时间并让 [^\Q$pfx\E\{\}]+
工作,但感觉不正确,因为
- 只包含不使用的符号集,
- 但不是外层的序列,
- 例如防止嵌套表达式。
正确的表达方式是什么?
测试例程
#!/usr/bin/env perl
use strict;
use warnings;
use MIME::Base64;
use feature qw(signatures);
no warnings 'once';
no warnings 'experimental';
no warnings 'experimental::signatures';
my $line =
'# test data
This are:
1. ${/multiline/used/*[3]
= "12345"}
2. ${/single/line/compile/time/pattern/*[3]}
3. ${/single/line/runtime/pattern/x == 1234}
4. ${/multi/line/runtime/pattern \
defer/1 \
defer/2 \
defer/3
}
5. ${//PG.GRM/*[
key eq "TEST.VAR"
]}
';
sub testSpacedPlaceHolder($pfx, $line) {
my %match;
my @pat = $line =~ /([^\]\Q$pfx\E\{[^\Q$pfx\E\{\}]+\})/gs;
my %seen = ();
my @uniq = grep { ! $seen{$_} ++ } @pat;
for my $key (@uniq) {
my $hkey=$pfx.encode_base64($key);
$hkey =~ s/\n//g;
my $var = substr($key, 3, -1);
$match{$hkey}= [ $var, $key ];
$line =~ s/\Q$key\E/$hkey/g;
}
# Test the output ------------------------------
my $cnt = 0;
print "\nRESULT:\n";
for my $key (sort keys %match) {
$cnt++;
my ($var, $orig) = @ { $match{$key} };
print "---- $cnt ----\n";
print "ORG: $orig\n";
print "VAR: $var\n";
print "ESC: $key\n";
}
print "\nLINE:\n$line\n";
return ($line, \%match);
}
testSpacedPlaceHolder('$', $line);
结果
/usr/bin/env perl "test-strings.pl"
RESULT:
---- 1 ----
ORG: ${/multi/line/runtime/pattern \
defer/1 \
defer/2 \
defer/3
}
VAR: /multi/line/runtime/pattern \
defer/1 \
defer/2 \
defer/3
ESC: $ICR7L211bHRpL2xpbmUvcnVudGltZS9wYXR0ZXJuIFwKICAgICAgZGVmZXIvMSBcCiAgICAgIGRlZmVyLzIgXAogICAgICBkZWZlci8zCiAgICAgfQ==
---- 2 ----
ORG: ${/multiline/used/*[3]
= "12345"}
VAR: /multiline/used/*[3]
= "12345"
ESC: $ICR7L211bHRpbGluZS91c2VkLypbM10KICAgICAgPSAiMTIzNDUifQ==
---- 3 ----
ORG: ${/single/line/compile/time/pattern/*[3]}
VAR: /single/line/compile/time/pattern/*[3]
ESC: $ICR7L3NpbmdsZS9saW5lL2NvbXBpbGUvdGltZS9wYXR0ZXJuLypbM119
---- 4 ----
ORG: ${/single/line/runtime/pattern/x == 1234}
VAR: /single/line/runtime/pattern/x == 1234
ESC: $ICR7L3NpbmdsZS9saW5lL3J1bnRpbWUvcGF0dGVybi94ID09IDEyMzR9
---- 5 ----
ORG: ${//PG.GRM/*[
key eq "TEST.VAR"
]}
VAR: //PG.GRM/*[
key eq "TEST.VAR"
]
ESC: $ICR7Ly9QRy5HUk0vKlsKICAgICAgICBrZXkgZXEgIlRFU1QuVkFSIgogICAgICAgXX0=
LINE:
# test data
This are:
1.$ICR7L211bHRpbGluZS91c2VkLypbM10KICAgICAgPSAiMTIzNDUifQ==
2.$ICR7L3NpbmdsZS9saW5lL2NvbXBpbGUvdGltZS9wYXR0ZXJuLypbM119
3.$ICR7L3NpbmdsZS9saW5lL3J1bnRpbWUvcGF0dGVybi94ID09IDEyMzR9
4.$ICR7L211bHRpL2xpbmUvcnVudGltZS9wYXR0ZXJuIFwKICAgICAgZGVmZXIvMSBcCiAgICAgIGRlZmVyLzIgXAogICAgICBkZWZlci8zCiAgICAgfQ==
5.$ICR7Ly9QRy5HUk0vKlsKICAgICAgICBrZXkgZXEgIlRFU1QuVkFSIgogICAgICAgXX0=
编辑
假设我有一个定义某种配置的脚本:
MAGIC: MAGIC.TYPE
CONTAINER: NAME BEGIN
DEFINE: VAR1 'USER.NAME'
DEFINE: VAR2 '65789'
INTERNAL.CONTAINER: INTERNAL.NAME BEGIN
TAG1: 'ABCDEF'
TAG2: ${/NAME/VAR1}
# Unwanted nested variant
TAG3: ${/NAME/VAR1 ${/NAME/VAR2} }
# Valid runtime interpolation variant
TAG4: "${/NAME/VAR1}/:{NAME.KEY}"
# Valid runtime path variant but ignored
TAG5: ${/NAME/VAR1/*/:{TEST{KEY}}
END.INTERNAL.NAME
END.NAME
我想避开嵌套行
# Nested variant
TAG3: ${/NAME/VAR1 ${/NAME/VAR2}}
出于可变解析原因,但保留
# Valid runtime path variant but ignored
TAG5: ${/NAME/VAR2/*/:{TEST{KEY}}
因为它们是运行时驱动的。
由于序列简单 [$\{\}]+
,我的变体阻止了 TAG5。
这是一个示例,说明如何使用递归正则表达式来排除 ${...}
的嵌套版本:
use feature qw(say);
use strict;
use warnings;
use Data::Dumper qw(Dumper);
my $str = <<'END_STR';
MAGIC: MAGIC.TYPE
CONTAINER: NAME BEGIN
DEFINE: VAR1 'USER.NAME'
DEFINE: VAR2 '65789'
INTERNAL.CONTAINER: INTERNAL.NAME BEGIN
TAG1: 'ABCDEF'
TAG2: ${/NAME/VAR1}
# Unwanted nested variant
TAG3: ${/NAME/VAR1 ${/NAME/VAR2} }
# Valid runtime interpolation variant
TAG4: "${/NAME/VAR1}/:{NAME.KEY}"
# Valid runtime path variant but ignored
TAG5: ${/NAME/VAR1/*/:{TEST{KEY}}}
END.INTERNAL.NAME
END.NAME
END_STR
my @matches;
while ($str =~ /(?:^|(?<!\))
(?<G3>(?<G1> $ \{ (?:
(?>(?:[^{}$\] | (?:\.) |
(?<G2> \{ (?: (?>[^{}]+) | (?&G2))* \} ))
| (?<G4>(?&G1))))* \}))/msxg) {
next if defined $+{G4}; # skip nested matches
push @matches, $+{G3};
}
print Dumper(\@matches);
输出:
$VAR1 = [
'${/NAME/VAR1}',
'${/NAME/VAR1}',
'${/NAME/VAR1/*/:{TEST{KEY}}}'
];
请注意,结果包括 TAG2、TAG4 和 TAG5,但不包括 TAG3。