Perl-regex 在应该区分大小写的时候不区分大小写
Perl-regex not case sensitive when it should be
我试图在 HTML 文件中标记为“\v{G}”或“\v{g}”的句子中找到一些特殊字符,以将它们替换为“Ǧ”和“ǧ” ",并将更正后的句子保存在新的 HTML 文件中。
我的正则表达式 (.*)\v\{(\w)\}(.*)
找到要替换的字符,但我无法根据大小写替换字符:生成的文件包含:
This is a sentence ǧ with a upper case G.
This is a sentence ǧ with a lower case g.
而不是:
This is a sentence Ǧ with a upper case G.
This is a sentence ǧ with a lower case g.
MWE
HTML 输入文件包含:
This is a sentence \v{G} with a upper case G.
This is a sentence \v{g} with a lower case g.
perl 文件包含:
use strict;
use warnings;
# Define variables
my ($inputfile, $outputfile, $inputone, $inputtwo, $part1, $specialcharacter, $part2);
# Initialize variables
$inputfile = "TestFile.html";
$outputfile = 'Results.html';
# Open output file
open(my $ofh, '>:encoding(UTF-8)', "$outputfile");
# Open input file
open(my $ifh, '<:encoding(UTF-8)', "$inputfile");
# Read input file
while(<$ifh>) {
# Analyse _temp.html file to identify special characters
($part1, $specialcharacter, $part2) = ($_ =~ /(.*)\v\{(\w)\}(.*)/);
if ($specialcharacter == "g") {
$specialcharacter = "ǧ";
}elsif ($specialcharacter == "G") {
$specialcharacter = "Ǧ";# PROBLEM
}
say $ofh "\t\t<p>$part1$specialcharacter$part2";
}
# Close input and output files
close $ifh;
close $ofh;
如评论中所述,==
是错误的运算符。您应该使用 eq
来比较 non-numeric 标量。
另一种方法是创建一种字典形式,即查找 table,然后在其中查找您的特殊字符。
# A map between the special characters and the html code you want in its place.
# Fill it with more if you've got them.
my %SpecialMap = (
'g' => 'ǧ',
'G' => 'Ǧ',
);
# Read input file
while(<$ifh>) {
# loop for as long as \v{character} is found in $_
while(/\v\{(\w)\}/) {
# Look up the character in the dictionary.
# Fallback if it's not in the map: Use the character as-is instead.
my $ch = $SpecialMap{} || ;
# Rebuild $_
$_ = $` . $ch . $';
}
# print the result
print $ofh $_;
}
为输入
Both \v{g} and \v{G} in here.
This is a sentence \v{g} with a lower case g.
This is a sentence \v{H} with a upper case H which is not in the map.
This contains nothing special.
它将产生以下输出:
Both ǧ and Ǧ in here.
This is a sentence ǧ with a lower case g.
This is a sentence H with a upper case H which is not in the map.
This contains nothing special.
受 Polar Bear 评论的启发,您可以使用 s///ge
来执行映射函数并获得相同的结果:
my %SpecialMap = (
'g' => 'ǧ',
'G' => 'Ǧ',
);
sub mapfunc {
return $SpecialMap{} || ;
}
# Read input file
while(<$ifh>) {
# /g substitute all matches on the line
# /e by executing mapfunc() for each
s/\v\{(\w)\}/mapfunc()/ge;
print $ofh $_;
}
我试图在 HTML 文件中标记为“\v{G}”或“\v{g}”的句子中找到一些特殊字符,以将它们替换为“Ǧ”和“ǧ” ",并将更正后的句子保存在新的 HTML 文件中。
我的正则表达式 (.*)\v\{(\w)\}(.*)
找到要替换的字符,但我无法根据大小写替换字符:生成的文件包含:
This is a sentence ǧ with a upper case G.
This is a sentence ǧ with a lower case g.
而不是:
This is a sentence Ǧ with a upper case G.
This is a sentence ǧ with a lower case g.
MWE
HTML 输入文件包含:
This is a sentence \v{G} with a upper case G.
This is a sentence \v{g} with a lower case g.
perl 文件包含:
use strict;
use warnings;
# Define variables
my ($inputfile, $outputfile, $inputone, $inputtwo, $part1, $specialcharacter, $part2);
# Initialize variables
$inputfile = "TestFile.html";
$outputfile = 'Results.html';
# Open output file
open(my $ofh, '>:encoding(UTF-8)', "$outputfile");
# Open input file
open(my $ifh, '<:encoding(UTF-8)', "$inputfile");
# Read input file
while(<$ifh>) {
# Analyse _temp.html file to identify special characters
($part1, $specialcharacter, $part2) = ($_ =~ /(.*)\v\{(\w)\}(.*)/);
if ($specialcharacter == "g") {
$specialcharacter = "ǧ";
}elsif ($specialcharacter == "G") {
$specialcharacter = "Ǧ";# PROBLEM
}
say $ofh "\t\t<p>$part1$specialcharacter$part2";
}
# Close input and output files
close $ifh;
close $ofh;
如评论中所述,==
是错误的运算符。您应该使用 eq
来比较 non-numeric 标量。
另一种方法是创建一种字典形式,即查找 table,然后在其中查找您的特殊字符。
# A map between the special characters and the html code you want in its place.
# Fill it with more if you've got them.
my %SpecialMap = (
'g' => 'ǧ',
'G' => 'Ǧ',
);
# Read input file
while(<$ifh>) {
# loop for as long as \v{character} is found in $_
while(/\v\{(\w)\}/) {
# Look up the character in the dictionary.
# Fallback if it's not in the map: Use the character as-is instead.
my $ch = $SpecialMap{} || ;
# Rebuild $_
$_ = $` . $ch . $';
}
# print the result
print $ofh $_;
}
为输入
Both \v{g} and \v{G} in here.
This is a sentence \v{g} with a lower case g.
This is a sentence \v{H} with a upper case H which is not in the map.
This contains nothing special.
它将产生以下输出:
Both ǧ and Ǧ in here.
This is a sentence ǧ with a lower case g.
This is a sentence H with a upper case H which is not in the map.
This contains nothing special.
受 Polar Bear 评论的启发,您可以使用 s///ge
来执行映射函数并获得相同的结果:
my %SpecialMap = (
'g' => 'ǧ',
'G' => 'Ǧ',
);
sub mapfunc {
return $SpecialMap{} || ;
}
# Read input file
while(<$ifh>) {
# /g substitute all matches on the line
# /e by executing mapfunc() for each
s/\v\{(\w)\}/mapfunc()/ge;
print $ofh $_;
}