php 正则表达式匹配可能的重音字符

Question

我发现了很多关于此的问题，但其中 none 帮助我解决了我的特定问题。情况：我想用 "blablebli" 之类的东西搜索 string，并能够找到匹配所有可能的重音变体（"blablebli"、"blábleblí"、"blâblèbli", 等等...) 在文本中。

我已经做了一个相反的解决方法（找到一个我写的没有可能重音的词）。但是我想不出一个方法来实现我想要的。

这是我的工作代码。（相关部分，这是 foreach 的一部分，所以我们只看到一个单词搜索）：

$word="something";
$word = preg_quote(trim($word)); //Just in case
$word2 = $this->removeAccents($word); // Removed all accents
if(!empty($word)) {
    $sentence = "/(".$word.")|(".$word2.")/ui"; // Now I'm checking with and without accents.
    if (preg_match($sentence, $content)){
        echo "found";
    }
}

和我的 removeAccents() 功能（我不确定我是否用 preg_replace() 涵盖了所有可能的口音。到目前为止它正在工作。如果有人检查我是否遗漏任何东西，我将不胜感激):

function removeAccents($string)
{
    return preg_replace('/[\`\~\']/', '', iconv('UTF-8', 'ASCII//TRANSLIT', $string));
}

我要避免的事情：

我知道我可以检查我的 $word 并将所有 a 替换为 [aàáãâä] 和与其他字母相同，但我不知道......它看到了一点点矫枉过正。
当然我可以在我的 if 中使用我自己的 removeAccents() 函数检查 $content 没有重音符号的语句，例如：
```
if (preg_match($sentence, $content) || preg_match($sentence, removeAccents($content)))
```

但我对第二种情况的问题是我想突出显示匹配后找到的单词。所以我不能改变我的 $content.

有什么方法可以改进我的 preg_match() 以包含可能的重音字符吗？或者我应该使用上面的第一个选项吗？

Answer 1

我会分解字符串，这样更容易删除有问题的字符，大致如下：

<?php

// Convert unicode input to NFKD form.
$str = Normalizer::normalize("blábleblí", Normalizer::FORM_KD);

// Remove all combining characters (https://en.wikipedia.org/wiki/Combining_character).
var_dump(preg_replace('/[\x{0300}-\x{036f}]/u', "", $str));

Answer 2

感谢大家的帮助，但我将使用我在问题中提出的第一个建议来结束它。再次感谢@CasimiretHippolyte 的耐心等待，让我意识到这并不像我想的那样矫枉过正。

这是我使用的最终代码（首先是函数）：

function removeAccents($string)
{
    return preg_replace('/[\x{0300}-\x{036f}]/u', '', Normalizer::normalize($string, Normalizer::FORM_KD));
}

function addAccents($string)
{
    $array1 = array('a', 'c', 'e', 'i' , 'n', 'o', 'u', 'y');
    $array2 = array('[aàáâãäå]','[cçćĉċč]','[eèéêë]','[iìíîï]','[nñ]','[oòóôõö]','[uùúûü]','[yýÿ]');

    return str_replace($array1, $array2, strtolower($string));
}

并且：

$word="something";
$word = preg_quote(trim($word)); //Just in case
$word2 = $this->addAccents($this->removeAccents($word)); //check all possible accents
if(!empty($word)) {
    $sentence = "/(".$word.")|(".$word2.")/ui"; // Now I'm checking my normal word and the possible variations of it.
    if (preg_match($sentence, $content)){
        echo "found";
    }
}

顺便说一句，我涵盖了来自我的国家（和其他一些国家）的所有可能的口音。您应该在使用前检查是否需要改进 addAccents() 功能。

php 正则表达式匹配可能的重音字符

php regex match possible accented characters

php

regex

search

special-characters