PHP 使用 preg_match 将数组中的项目与可以包含或不包含重音字符的值匹配

Question

preg_match 必须将 $string 变量中的任何单词（只要它们至少有 3 个字符长）与 $forbidden 数组中的任何单词相匹配，但问题是：

如果 $string 包含单词 mamiferos（带有重音符号）而不是 mamiferos，它也应该是一场比赛。如果 acompanar 在禁止数组列表中，但用户决定键入 acompanar（不带重音符号），则同样适用。

$forbidden = array('mamiferos', 'acompañar');

$string = 'los mamíferos corren libres y quieren acompanar a su madre';

if(preg_match('/\b(?:'.implode('|', $forbidden).'){3,}/i', $string)) {
    echo 'match!';
} else {
    echo 'nope...';
}

Answer 1

这对我有用，您要做的就是从输入字符串中删除重音字符，尽管这不是最佳答案。

    $forbidden = array('mamiferos', 'acompañar');

    $string = 'los mamíferos corren libres y quieren acompanar a su madre';

    $stripted_accent = remove_accent($string);

    if(preg_match('/(' . remove_accent(implode('|', $forbidden)) .')/', $stripted_accent)) {

        echo 'match!';

    } else {

        echo 'nope...';

    }

    function remove_accent($accent) {
        return strtr(utf8_decode($accent), utf8_decode('àáâãäçèéêëìíîïñòóôõöùúûüýÿÀÁÂÃÄÇÈÉÊËÌÍÎÏÑÒÓÔÕÖÙÚÛÜÝ'), 'aaaaaceeeeiiiinooooouuuuyyAAAAACEEEEIIIINOOOOOUUUUY');
    }

Answer 2

我建议一种解决方案，该解决方案基于从过滤字符串和禁用词中删除任何组合 Unicode 字符。它将需要 intl 扩展名 (sudo apt install php7.4-intl && sudo phpenmod intl)。首先，它将 Uncode 字符串分解为字符和组合字母修饰符，其次，它删除所有修饰符 (\p{M}):

<?php
$string = 'los mamíferos corren libres y quieren acompanar a su madre';

$forbidden = ['mamiferos', 'acompañar'];

function strip (string $accented): string {
    $decomposed = Normalizer::normalize ($accented, Normalizer::FORM_D);
    return preg_replace ('/\p{M}/u', '', $decomposed);
}

function filter (string $string, array $words): bool {
    $regex = '/\b(?:' . implode ('|', $words) . ')/i';
    return preg_match (strip ($regex), strip ($string));
}
echo ((filter ($string, $forbidden) ? 'match!' : 'nope...') . "\n");

顺便说一句，我不明白你的正则表达式中 {3,} 的含义，我把它从我的正则表达式中删除了。如果您认为它会匹配一个包含三个或更多禁止词的字符串，那您就错了：禁止词只有在紧接在一起时才会匹配。

延伸阅读：https://www.php.net/manual/en/class.normalizer.

PHP 使用 preg_match 将数组中的项目与可以包含或不包含重音字符的值匹配

PHP using preg_match to match items in array with values that can or not contain accent characters

php

preg-match