php preg_replace 为每个捕获组分配不同的替换模式
php preg_replace assign different replacement pattern for each capturing group
我正在尝试在布尔模式下执行 mysql 全文搜索,我需要在构建 mysql 查询之前准备搜索文本。
为了实现它,我虽然可以使用 PHP 函数 preg_replace
并用一个特定的模式替换每个捕获组。
- 第一个模式必须在引号 (
"hello world"
) 之间找到单词或句子,并在 (+"hello world"
) 之前添加一个 +
。
- 第二个模式必须找到其余的单词(不带引号)并在 (
+how*
) 之前添加一个 +
,在 (+how*
) 之后添加一个 *
。
正则表达式模式
["']+([^"']+)["']+|([^\s"']+)
替换模式
+"" +*
示例
对于以下输入:
"hello world" how are you?
应该return:
+"hello world" +how* +are* +you?*
但是,它 returns something 'wrong':
+"hello world" +* +"" +how* +"" +are* +"" +you?*
我知道替换模式 +"" +*
永远不会起作用,因为我没有告诉任何地方 +"..."
应该只适用于第一个捕获组而 +...*
应该适用于第二个
PHP代码
$query = preg_replace('~["\']+([^"\']+)["\']+|([^\s"\']+)~', '+"" +*', $query);
有没有办法在 PHP 中实现这一点?提前谢谢你。
编辑/解决方案
感谢 @revo suggestion to use the PHP function preg_replace_callback
, I managed to assign a replace pattern to each search pattern with the extended function preg_replace_callback_array
。请注意,此函数需要 PHP >= 7.
这里是我 post 用于通过 MATCH (...) AGAINST (...) IN BOOLEAN MODE
执行 FULLTEXT
搜索的函数的最终版本。该函数在 Wordpress 插件的 class dbReader
中声明。也许对某人有用。
// Return maximum 100 ids of products matching $query in
// name or description searching for each word using MATCH AGAINST in BOOLEAN MODE
public function search_products($query) {
function replace_callback($m, $f) {
return sprintf($f, isset($m[1]) ? $m[1] : $m[0]);
}
// Replace simple quotes by double quotes in strings between quotes:
// iPhone '8 GB' => iPhone "8 GB"
// Apple's iPhone 8 '32 GB' => Apple's iPhone 8 "32 GB"
// This is necessary later when the matches are devided in two groups:
// 1. Strings not between double quotes
// 2. Strings between double quotes
$query = preg_replace("~(\s*)'+([^']+)'+(\s*)~", '""', $query);
// Do some magic to take numbers with their units as one word
// iPhone 8 64 GB => iPhone 8 "64 GB"
$pattern = array(
'(\b[.,0-9]+)\s*(gb\b)',
'(\b[.,0-9]+)\s*(mb\b)',
'(\b[.,0-9]+)\s*(mm\b)',
'(\b[.,0-9]+)\s*(mhz\b)',
'(\b[.,0-9]+)\s*(ghz\b)'
);
array_walk($pattern, function(&$value) {
// Surround with double quotes only if the user isn't doing manual grouping
$value = '~'.$value.'(?=(?:[^"]*"[^"]*")*[^"]*\Z)~i';
});
$query = preg_replace($pattern, '" "', $query);
// Prepare query string for a "match against" in "boolean mode"
$patterns = array(
// 1. All strings not sorrounded by double quotes
'~([^\s"]+)(?=(?:[^"]*"[^"]*")*[^"]*\Z)~' => function($m){
return replace_callback($m, '+%s*');
},
// 2. All strings between double quotes
'~"+([^"]+)"+~' => function($m){
return replace_callback($m, '+"%s"');
}
);
// Replace every single word by a boolean expression: +some* +word*
// Respect quoted strings: +"iPhone 8"
// preg_replace_callback_array needs PHP Version >= 7
$query = preg_replace_callback_array($patterns, $query);
$fulltext_fields = array(
'title' => array(
'importance' => 1.5,
'table' => 'p',
'fields' => array(
'field1',
'field2',
'field3',
'field4'
)
),
'description' => array(
'importance' => 1,
'table' => 'p',
'fields' => array(
'field5',
'field6',
'field7',
'field8'
)
)
);
$select_match = $match_full = $priority_order = "";
$args = array();
foreach ($fulltext_fields as $index => $obj) {
$match = $obj['table'].".".implode(", ".$obj['table'].".", $obj['fields']);
$select_match .= ", MATCH ($match) AGAINST (%s IN BOOLEAN MODE) AS {$index}_score";
$match_full .= ($match_full!=""?", ":"").$match;
$priority_order.= ($priority_order==""?"ORDER BY ":" + ")."({$index}_score * {$obj['importance']})";
array_push($args, $query);
}
$priority_order .= $priority_order!=""?" DESC":"";
// User input $query is passed as %s parameter to db->prepare() in order to avoid SQL injection
array_push($args, $query, $this->model_name, $this->view_name);
return $this->db->get_col(
$this->db->prepare(
"SELECT p.__pk $select_match
FROM ankauf_... AND
MATCH ($match_full) AGAINST (%s IN BOOLEAN MODE)
INNER JOIN ...
WHERE
m.bezeichnung=%s AND
a.bezeichnung=%s
$priority_order
LIMIT 100
;",
$args
)
);
}
你必须使用 preg_replace_callback
:
$str = '"hello world" how are you?';
echo preg_replace_callback('~("[^"]+")|\S+~', function($m) {
return isset($m[1]) ? "+" . $m[1] : "+" . $m[0] . "*";
}, $str);
输出:
+"hello world" +how* +are* +you?*
我正在尝试在布尔模式下执行 mysql 全文搜索,我需要在构建 mysql 查询之前准备搜索文本。
为了实现它,我虽然可以使用 PHP 函数 preg_replace
并用一个特定的模式替换每个捕获组。
- 第一个模式必须在引号 (
"hello world"
) 之间找到单词或句子,并在 (+"hello world"
) 之前添加一个+
。 - 第二个模式必须找到其余的单词(不带引号)并在 (
+how*
) 之前添加一个+
,在 (+how*
) 之后添加一个*
。
正则表达式模式
["']+([^"']+)["']+|([^\s"']+)
替换模式
+"" +*
示例
对于以下输入:
"hello world" how are you?
应该return:
+"hello world" +how* +are* +you?*
但是,它 returns something 'wrong':
+"hello world" +* +"" +how* +"" +are* +"" +you?*
我知道替换模式 +"" +*
永远不会起作用,因为我没有告诉任何地方 +"..."
应该只适用于第一个捕获组而 +...*
应该适用于第二个
PHP代码
$query = preg_replace('~["\']+([^"\']+)["\']+|([^\s"\']+)~', '+"" +*', $query);
有没有办法在 PHP 中实现这一点?提前谢谢你。
编辑/解决方案
感谢 @revo suggestion to use the PHP function preg_replace_callback
, I managed to assign a replace pattern to each search pattern with the extended function preg_replace_callback_array
。请注意,此函数需要 PHP >= 7.
这里是我 post 用于通过 MATCH (...) AGAINST (...) IN BOOLEAN MODE
执行 FULLTEXT
搜索的函数的最终版本。该函数在 Wordpress 插件的 class dbReader
中声明。也许对某人有用。
// Return maximum 100 ids of products matching $query in
// name or description searching for each word using MATCH AGAINST in BOOLEAN MODE
public function search_products($query) {
function replace_callback($m, $f) {
return sprintf($f, isset($m[1]) ? $m[1] : $m[0]);
}
// Replace simple quotes by double quotes in strings between quotes:
// iPhone '8 GB' => iPhone "8 GB"
// Apple's iPhone 8 '32 GB' => Apple's iPhone 8 "32 GB"
// This is necessary later when the matches are devided in two groups:
// 1. Strings not between double quotes
// 2. Strings between double quotes
$query = preg_replace("~(\s*)'+([^']+)'+(\s*)~", '""', $query);
// Do some magic to take numbers with their units as one word
// iPhone 8 64 GB => iPhone 8 "64 GB"
$pattern = array(
'(\b[.,0-9]+)\s*(gb\b)',
'(\b[.,0-9]+)\s*(mb\b)',
'(\b[.,0-9]+)\s*(mm\b)',
'(\b[.,0-9]+)\s*(mhz\b)',
'(\b[.,0-9]+)\s*(ghz\b)'
);
array_walk($pattern, function(&$value) {
// Surround with double quotes only if the user isn't doing manual grouping
$value = '~'.$value.'(?=(?:[^"]*"[^"]*")*[^"]*\Z)~i';
});
$query = preg_replace($pattern, '" "', $query);
// Prepare query string for a "match against" in "boolean mode"
$patterns = array(
// 1. All strings not sorrounded by double quotes
'~([^\s"]+)(?=(?:[^"]*"[^"]*")*[^"]*\Z)~' => function($m){
return replace_callback($m, '+%s*');
},
// 2. All strings between double quotes
'~"+([^"]+)"+~' => function($m){
return replace_callback($m, '+"%s"');
}
);
// Replace every single word by a boolean expression: +some* +word*
// Respect quoted strings: +"iPhone 8"
// preg_replace_callback_array needs PHP Version >= 7
$query = preg_replace_callback_array($patterns, $query);
$fulltext_fields = array(
'title' => array(
'importance' => 1.5,
'table' => 'p',
'fields' => array(
'field1',
'field2',
'field3',
'field4'
)
),
'description' => array(
'importance' => 1,
'table' => 'p',
'fields' => array(
'field5',
'field6',
'field7',
'field8'
)
)
);
$select_match = $match_full = $priority_order = "";
$args = array();
foreach ($fulltext_fields as $index => $obj) {
$match = $obj['table'].".".implode(", ".$obj['table'].".", $obj['fields']);
$select_match .= ", MATCH ($match) AGAINST (%s IN BOOLEAN MODE) AS {$index}_score";
$match_full .= ($match_full!=""?", ":"").$match;
$priority_order.= ($priority_order==""?"ORDER BY ":" + ")."({$index}_score * {$obj['importance']})";
array_push($args, $query);
}
$priority_order .= $priority_order!=""?" DESC":"";
// User input $query is passed as %s parameter to db->prepare() in order to avoid SQL injection
array_push($args, $query, $this->model_name, $this->view_name);
return $this->db->get_col(
$this->db->prepare(
"SELECT p.__pk $select_match
FROM ankauf_... AND
MATCH ($match_full) AGAINST (%s IN BOOLEAN MODE)
INNER JOIN ...
WHERE
m.bezeichnung=%s AND
a.bezeichnung=%s
$priority_order
LIMIT 100
;",
$args
)
);
}
你必须使用 preg_replace_callback
:
$str = '"hello world" how are you?';
echo preg_replace_callback('~("[^"]+")|\S+~', function($m) {
return isset($m[1]) ? "+" . $m[1] : "+" . $m[0] . "*";
}, $str);
输出:
+"hello world" +how* +are* +you?*