将德语变音符号与正则表达式正确匹配两次

matching german umlaut with regexp correctly twice

我有一个小脚本,如果字符串包含 äöüß 等德语变音符号,它会通过正则表达式进行匹配。在第一个正则表达式匹配中,一切正常,但如果我再次检查同一个字符串,它就不再正确匹配。文件本身编码为 utf8,我还包括 utf8 模块。

这是脚本:

#!/usr/bin/perl
use strict;
use warnings FATAL => 'all';
use utf8;
use Log4Perl::logger_helper qw( init_logger get_logger_and_trace );

my $strings = ["ä", "ae","ö", "oe", "ü", "ue", "ß", "ss"];

my $logger = init_logger(
    log_file_path => [=10=] . '.log'
);    # init_logger variables are all optional

foreach my $string (@$strings) {
    for(1..5) {
        if ( $string =~ /[\x{00C4}\x{00E4}\x{00D6}\x{00F6}\x{00DC}\x{00FC}\x{00DF}]/gi ) {
            $logger->info("umlauts match $string");
        }
        else {
            $logger->info("no umlauts $string");
        }
    }
 }

这是输出:

umlauts match ä
no umlauts ä
umlauts match ä
no umlauts ä
umlauts match ä
no umlauts ae
no umlauts ae
no umlauts ae
no umlauts ae
no umlauts ae
umlauts match ö
no umlauts ö
umlauts match ö
no umlauts ö
umlauts match ö
no umlauts oe
no umlauts oe
no umlauts oe
no umlauts oe
no umlauts oe
umlauts match ü
no umlauts ü
umlauts match ü
no umlauts ü
umlauts match ü
no umlauts ue
no umlauts ue
no umlauts ue
no umlauts ue
no umlauts ue
umlauts match ß
no umlauts ß
umlauts match ß
no umlauts ß
umlauts match ß
no umlauts ss
no umlauts ss
no umlauts ss
no umlauts ss
no umlauts ss

Process finished with exit code 0

我用不同版本的 strawberry perl 在不同 OS 上测试了它,最新版本 (strawberry-perl-5.30.0.1-64bit-portable) 也向我显示了这个错误。

知道为什么它能正确匹配改变吗?如果我对多个索引操作执行相同的操作,它就可以正常工作。

提前致谢。

问题是 global 标志。删除它。

正如 @daxim 所解释的,全局标志 /g 在这里造成了严重破坏。

来自 Regexp Quote-Like Operators,重要部分以 粗体 突出显示:

In scalar context, each execution of m//g finds the next match, returning true if it matches, and false if there is no further match. The position after the last match can be read or set using the pos() function. A failed match normally resets the search position to the beginning of the string, but you can avoid that by adding the /c modifier (for example, m//gc). Modifying the target string also resets the search position.

由于您在同一个 $string 中重复搜索(中间没有修改),每次搜索都会在最后一次成功匹配后继续进行,导致失败并为下一次搜索重置搜索位置。

另请参阅 Using regular expressions in Perl 中的 "Global matching":

The modifier /g stands for global matching and allows the matching operator to match within a string as many times as possible. In scalar context, successive invocations against a string will have /g jump from match to match, keeping track of position in the string as it goes along. You can get or set the position with the pos() function.

再见,丹尼尔:-)