如何为 preg_replace 中的每个替换字符串分配一个 ID 并获取匹配单词列表

How to assign an ID to each replaced string in preg_replace and get a list of matched words

我已经有一个可用的代码,但我需要添加两个附加功能。这段代码基本上替换了句子中的所有坏词,并用点替换它(留下字母的第一个单词对于 reader 可见)。

我需要添加的新功能:

  1. 为 preg_replace

  2. 中的每个替换字符串分配具有唯一 ID(自动增量)的 html 范围
  3. 将所有匹配的词(包括重复的实例)以相同的顺序添加到 php 变量中。

这是我当前的代码:

function sanitize_badwords($string) {
    $list = array(
        "dumb",
        "stupid",
        "brainless"
    );

    # use array_map to generate a regex of array for each word
    $relist = array_map(function($s) {
        return '/(?:\b(' . $s[0] . ')(?=' . substr($s, 1) . '\b)|(?!\A)\G)\pL/';
    }, $list);

    # call preg_replace using list of regex
    return preg_replace($relist, '<span id="bad_'.$counter.'">.</span>', $string);
}

echo sanitize_badwords('You are kind of dumb and brainless. Very dumb!');

当前代码打印:

You are kind of d... and b......... Very d....!

实施第一个功能后,结果应为:

You are kind of <span id="bad_1">d...</span> and <span id="bad_2">b........</span>. Very <span id="bad_3">d...</span>!

第二个功能应该允许我有一个包含所有匹配单词(包括重复实例)的 php 数组:

$matches = array('dumb', 'brainless', 'dumb');

我需要这个的原因是,出于 ToS 的原因,我无法在可抓取的 html 中打印坏词,但我仍然需要通过 [=47 在鼠标悬停时显示坏词=] 稍后(我可以轻松获取 $matches 的内容并将其转换为 javascript 数组并将它们分配给所有 bad_ids 跨度的悬停状态)。

您可以使用 preg_replace_callback() 并传递 $counter 引用来递增它:

$list = array("dumb", "stupid", "brainless");
$string = 'You are kind of dumb and brainless. Very dumb!';


// See comments below - Many thanks @revo
usort($list, function($a,$b) { return strlen($b) < strlen($b); }); 

$counter = 0 ; // Initialize the counter
$list_q = array_map('preg_quote', $list) ; // secure strings for RegExp


// Transform the string
$string = preg_replace_callback('~(' . implode('|',$list_q) . ')~', 
    function($matches) use (&$counter) {
       $counter++;
       return '<span id="bad_' . $counter . '">'
           . substr($matches[0], 0, 1)
           . str_repeat('.', strlen($matches[0]) - 1)
           . '</span>' ;
}, $string);

echo $string;

将输出:

You are kind of <span id="bad_1">d...</span> and <span id="bad_2">b........</span>. Very <span id="bad_3">d...</span>!

使用一个函数,将匹配项存储在 $references 变量中:

function sanitize_badwords($string, &$references) {

    static $counter  ;
    static $list  ;
    static $list_q  ;

    if (!isset($counter)) {
        $counter = 0 ;
        $list = array("dumb", "stupid", "brainless");

        // See comments below - Many Thanks @revo
        usort($list, function($a,$b) { return strlen($b)< strlen($b) ; }); 

        $list_q = array_map('preg_quote', $list);
    }

    return preg_replace_callback('~('.implode('|',$list_q).')~',
        function($matches) use (&$counter, &$references){
            $counter++;
            $references[$counter] = $matches[0];
            return '<span id="bad_'.$counter.'">'
               . substr($matches[0],0,1)
               . str_repeat('.', strlen($matches[0])-1)
               . '</span>' ;

    }, $string) ;
}

$matches = [] ;
echo sanitize_badwords('You are kind of dumb and brainless. Very dumb!', $matches) ;


print_r($matches);

将输出:

You are kind of <span id="bad_1">d...</span> and <span id="bad_2">b........</span>. Very <span id="bad_3">d...</span>!

Array
(
    [1] => dumb
    [2] => brainless
    [3] => dumb
)