如何为 preg_replace 中的每个替换字符串分配一个 ID 并获取匹配单词列表
How to assign an ID to each replaced string in preg_replace and get a list of matched words
我已经有一个可用的代码,但我需要添加两个附加功能。这段代码基本上替换了句子中的所有坏词,并用点替换它(留下字母的第一个单词对于 reader 可见)。
我需要添加的新功能:
为 preg_replace
中的每个替换字符串分配具有唯一 ID(自动增量)的 html 范围
将所有匹配的词(包括重复的实例)以相同的顺序添加到 php 变量中。
这是我当前的代码:
function sanitize_badwords($string) {
$list = array(
"dumb",
"stupid",
"brainless"
);
# use array_map to generate a regex of array for each word
$relist = array_map(function($s) {
return '/(?:\b(' . $s[0] . ')(?=' . substr($s, 1) . '\b)|(?!\A)\G)\pL/';
}, $list);
# call preg_replace using list of regex
return preg_replace($relist, '<span id="bad_'.$counter.'">.</span>', $string);
}
echo sanitize_badwords('You are kind of dumb and brainless. Very dumb!');
当前代码打印:
You are kind of d... and b......... Very d....!
实施第一个功能后,结果应为:
You are kind of <span id="bad_1">d...</span> and <span id="bad_2">b........</span>. Very <span id="bad_3">d...</span>!
第二个功能应该允许我有一个包含所有匹配单词(包括重复实例)的 php 数组:
$matches = array('dumb', 'brainless', 'dumb');
我需要这个的原因是,出于 ToS 的原因,我无法在可抓取的 html 中打印坏词,但我仍然需要通过 [=47 在鼠标悬停时显示坏词=] 稍后(我可以轻松获取 $matches 的内容并将其转换为 javascript 数组并将它们分配给所有 bad_ids 跨度的悬停状态)。
您可以使用 preg_replace_callback()
并传递 $counter
引用来递增它:
$list = array("dumb", "stupid", "brainless");
$string = 'You are kind of dumb and brainless. Very dumb!';
// See comments below - Many thanks @revo
usort($list, function($a,$b) { return strlen($b) < strlen($b); });
$counter = 0 ; // Initialize the counter
$list_q = array_map('preg_quote', $list) ; // secure strings for RegExp
// Transform the string
$string = preg_replace_callback('~(' . implode('|',$list_q) . ')~',
function($matches) use (&$counter) {
$counter++;
return '<span id="bad_' . $counter . '">'
. substr($matches[0], 0, 1)
. str_repeat('.', strlen($matches[0]) - 1)
. '</span>' ;
}, $string);
echo $string;
将输出:
You are kind of <span id="bad_1">d...</span> and <span id="bad_2">b........</span>. Very <span id="bad_3">d...</span>!
使用一个函数,将匹配项存储在 $references
变量中:
function sanitize_badwords($string, &$references) {
static $counter ;
static $list ;
static $list_q ;
if (!isset($counter)) {
$counter = 0 ;
$list = array("dumb", "stupid", "brainless");
// See comments below - Many Thanks @revo
usort($list, function($a,$b) { return strlen($b)< strlen($b) ; });
$list_q = array_map('preg_quote', $list);
}
return preg_replace_callback('~('.implode('|',$list_q).')~',
function($matches) use (&$counter, &$references){
$counter++;
$references[$counter] = $matches[0];
return '<span id="bad_'.$counter.'">'
. substr($matches[0],0,1)
. str_repeat('.', strlen($matches[0])-1)
. '</span>' ;
}, $string) ;
}
$matches = [] ;
echo sanitize_badwords('You are kind of dumb and brainless. Very dumb!', $matches) ;
print_r($matches);
将输出:
You are kind of <span id="bad_1">d...</span> and <span id="bad_2">b........</span>. Very <span id="bad_3">d...</span>!
Array
(
[1] => dumb
[2] => brainless
[3] => dumb
)
我已经有一个可用的代码,但我需要添加两个附加功能。这段代码基本上替换了句子中的所有坏词,并用点替换它(留下字母的第一个单词对于 reader 可见)。
我需要添加的新功能:
为 preg_replace
中的每个替换字符串分配具有唯一 ID(自动增量)的 html 范围
将所有匹配的词(包括重复的实例)以相同的顺序添加到 php 变量中。
这是我当前的代码:
function sanitize_badwords($string) {
$list = array(
"dumb",
"stupid",
"brainless"
);
# use array_map to generate a regex of array for each word
$relist = array_map(function($s) {
return '/(?:\b(' . $s[0] . ')(?=' . substr($s, 1) . '\b)|(?!\A)\G)\pL/';
}, $list);
# call preg_replace using list of regex
return preg_replace($relist, '<span id="bad_'.$counter.'">.</span>', $string);
}
echo sanitize_badwords('You are kind of dumb and brainless. Very dumb!');
当前代码打印:
You are kind of d... and b......... Very d....!
实施第一个功能后,结果应为:
You are kind of <span id="bad_1">d...</span> and <span id="bad_2">b........</span>. Very <span id="bad_3">d...</span>!
第二个功能应该允许我有一个包含所有匹配单词(包括重复实例)的 php 数组:
$matches = array('dumb', 'brainless', 'dumb');
我需要这个的原因是,出于 ToS 的原因,我无法在可抓取的 html 中打印坏词,但我仍然需要通过 [=47 在鼠标悬停时显示坏词=] 稍后(我可以轻松获取 $matches 的内容并将其转换为 javascript 数组并将它们分配给所有 bad_ids 跨度的悬停状态)。
您可以使用 preg_replace_callback()
并传递 $counter
引用来递增它:
$list = array("dumb", "stupid", "brainless");
$string = 'You are kind of dumb and brainless. Very dumb!';
// See comments below - Many thanks @revo
usort($list, function($a,$b) { return strlen($b) < strlen($b); });
$counter = 0 ; // Initialize the counter
$list_q = array_map('preg_quote', $list) ; // secure strings for RegExp
// Transform the string
$string = preg_replace_callback('~(' . implode('|',$list_q) . ')~',
function($matches) use (&$counter) {
$counter++;
return '<span id="bad_' . $counter . '">'
. substr($matches[0], 0, 1)
. str_repeat('.', strlen($matches[0]) - 1)
. '</span>' ;
}, $string);
echo $string;
将输出:
You are kind of <span id="bad_1">d...</span> and <span id="bad_2">b........</span>. Very <span id="bad_3">d...</span>!
使用一个函数,将匹配项存储在 $references
变量中:
function sanitize_badwords($string, &$references) {
static $counter ;
static $list ;
static $list_q ;
if (!isset($counter)) {
$counter = 0 ;
$list = array("dumb", "stupid", "brainless");
// See comments below - Many Thanks @revo
usort($list, function($a,$b) { return strlen($b)< strlen($b) ; });
$list_q = array_map('preg_quote', $list);
}
return preg_replace_callback('~('.implode('|',$list_q).')~',
function($matches) use (&$counter, &$references){
$counter++;
$references[$counter] = $matches[0];
return '<span id="bad_'.$counter.'">'
. substr($matches[0],0,1)
. str_repeat('.', strlen($matches[0])-1)
. '</span>' ;
}, $string) ;
}
$matches = [] ;
echo sanitize_badwords('You are kind of dumb and brainless. Very dumb!', $matches) ;
print_r($matches);
将输出:
You are kind of <span id="bad_1">d...</span> and <span id="bad_2">b........</span>. Very <span id="bad_3">d...</span>!
Array
(
[1] => dumb
[2] => brainless
[3] => dumb
)