Perl-regex 在应该区分大小写的时候不区分大小写

Perl-regex not case sensitive when it should be

我试图在 HTML 文件中标记为“\v{G}”或“\v{g}”的句子中找到一些特殊字符,以将它们替换为“Ǧ”和“ǧ” ",并将更正后的句子保存在新的 HTML 文件中。

我的正则表达式 (.*)\v\{(\w)\}(.*) 找到要替换的字符,但我无法根据大小写替换字符:生成的文件包含:

This is a sentence ǧ with a upper case G.
This is a sentence ǧ with a lower case g. 

而不是:

This is a sentence Ǧ with a upper case G.
This is a sentence ǧ with a lower case g.

MWE

HTML 输入文件包含:

This is a sentence \v{G} with a upper case G.
This is a sentence \v{g} with a lower case g.

perl 文件包含:

use strict;
use warnings;

# Define variables
my ($inputfile, $outputfile, $inputone, $inputtwo, $part1, $specialcharacter, $part2);

# Initialize variables
$inputfile = "TestFile.html";
$outputfile = 'Results.html';

# Open output file
open(my $ofh, '>:encoding(UTF-8)', "$outputfile");

# Open input file
open(my $ifh, '<:encoding(UTF-8)', "$inputfile");

# Read input file
while(<$ifh>) {
    # Analyse _temp.html file to identify special characters
        ($part1, $specialcharacter, $part2) = ($_ =~ /(.*)\v\{(\w)\}(.*)/);
        if ($specialcharacter == "g") {
            $specialcharacter = "&#487";
        }elsif ($specialcharacter == "G") {
            $specialcharacter = "&#486";# PROBLEM 
        }
        say $ofh "\t\t<p>$part1$specialcharacter$part2";
}

# Close input and output files
close $ifh;
close $ofh;

如评论中所述,== 是错误的运算符。您应该使用 eq 来比较 non-numeric 标量。

另一种方法是创建一种字典形式,即查找 table,然后在其中查找您的特殊字符。

# A map between the special characters and the html code you want in its place.
# Fill it with more if you've got them.
my %SpecialMap = (
    'g' => '&#487;',
    'G' => '&#486;',
);

# Read input file
while(<$ifh>) {
    # loop for as long as \v{character} is found in $_
    while(/\v\{(\w)\}/) {
        # Look up the character in the dictionary.
        # Fallback if it's not in the map: Use the character as-is instead.
        my $ch = $SpecialMap{} || ;
        # Rebuild $_
        $_ = $` . $ch . $';
    }
    # print the result
    print $ofh $_;
}

为输入

Both \v{g} and \v{G} in here.
This is a sentence \v{g} with a lower case g.
This is a sentence \v{H} with a upper case H which is not in the map.
This contains nothing special.

它将产生以下输出:

Both &#487; and &#486; in here.
This is a sentence &#487; with a lower case g.
This is a sentence H with a upper case H which is not in the map.
This contains nothing special.

受 Polar Bear 评论的启发,您可以使用 s///ge 来执行映射函数并获得相同的结果:

my %SpecialMap = (
    'g' => '&#487;',
    'G' => '&#486;',
);

sub mapfunc {
    return $SpecialMap{} || ;
}

# Read input file
while(<$ifh>) {
    # /g substitute all matches on the line
    # /e by executing mapfunc() for each
    s/\v\{(\w)\}/mapfunc()/ge;
    print $ofh $_;
}