从字符串中删除列入黑名单的术语,然后删除不必要的空格
Remove blacklisted terms from string then eliminate unnecessary spaces
我有一系列已列入黑名单的字词:
$arrayBlacklist = array("Kota","Kab.","Kota Administrasi","KAB", "KOTA", "Kabupaten");
我有一个字符串需要清理:
$city = "Kota Jakarta Selatan";
// also: "Kab. Jakarta Selatan", "Kota Administrasi Jakarta Selatan", ...
我只想删除 $arrayBlacklist
值(如果它在 $city
变量中)。
所以,我得到 $city = "Jakarta Selatan"
$arrayBlacklist = array("Kota Administrasi", "Kota","Kab.","KAB", "KOTA", "Kabupaten");
rsort($arrayBlacklist);
$city = "Kota Jakarta Selatan";
$city = trim(preg_replace('/\s+/', ' ',str_replace($arrayBlacklist, '', $city)));
你可以使用
https://www.php.net/manual/en/function.str-replace.php
str_replace 可以使用数组作为搜索和替换语句。
不如其他答案优雅,但完成了工作。
$arrayBlacklist = ['Kota', 'Kab.', 'Kota Administrasi', 'KAB', 'KOTA', 'Kabupaten'];
$city = 'Kota Jakarta Selatan';
// make an array of words from the city name
$cityAsArray = explode(' ', $city);
foreach ($cityAsArray as $key => $part) {
// check if word is in blacklist
if (in_array($part, $arrayBlacklist)) {
// remove from the array if it is blacklisted
unset($cityAsArray[$key]);
}
}
// convert the city name back to string
$city = implode(' ', $cityAsArray);
更新:
我们可以根据单词数对黑名单数组进行排序,然后用字符串替换每个列入黑名单的字符串。
$arrayBlacklist = ["Kota", "Kab.", "Kota Administrasi", "KAB", "KOTA", "Kabupaten"];
$city = 'Kota Administrasi Jakarta Selatan';
usort($arrayBlacklist, function ($a, $b) {
return substr_count($a, ' ') < substr_count($b, ' ');
});
foreach ($arrayBlacklist as $blacklist) {
$city = trim(str_replace($blacklist, '', $city));
}
- 使用
usort
. 根据字符串长度对数组进行排序以避免重叠问题
preg_replace
每个字符串 不区分大小写 方式。
- 最后,使用
str_replace
. 删除所有带有单个 space 的双 space
片段:
<?php
$arrayBlacklist = array("Kota","Kab.","Kota Administrasi","KAB", "KOTA", "Kabupaten","Jakarta");
usort($arrayBlacklist,function($a,$b){
return strlen($b) <=> strlen($a);
});
$city = "Kota Jakarta Selatan kota Administrasi ki";
$city = " ". $city. " "; // add spaces to ease the matching
foreach($arrayBlacklist as $val){
$city = preg_replace('/\s'.$val.'\s/i',' ',$city); // replace with double spaces to avoid recursive matching
}
$city = str_replace(" "," ",trim($city));
echo $city;
更新:
preg_replace
将左右两侧的字符串匹配为被 space 覆盖的字符串,因为您有时在列入黑名单的字符串中也有非单词字符。为了简化匹配,我们特意在循环开始前添加前导和尾随 spaces。
注意:我们将preg_replace
中匹配到的字符串替换为双space,避免与其他字符串进行递归匹配。
我认为 strtr()
是这项工作的最佳工具,因为:
- 您不需要对黑名单数组进行预排序
- 替换时较长的匹配项将取代较短的匹配项。
如此有效,您“翻译”,然后 trim leading/trailing 个空格,然后删除所有内部冗余空格。
代码:(Demo)
$arrayBlacklist = ["Kota Administrasi", "Kota","Kab.","KAB", "KOTA", "Kabupaten"];
$trans = array_fill_keys($arrayBlacklist, '');
$cities = [
"Kota Jakarta Selatan",
"Kota Administrasi Selatan",
"Kab. What Kota Kab.",
"KOTA Kota Coca Cola",
];
foreach ($cities as $city) {
var_export(
preg_replace('/\s{2,}/', ' ', trim(strtr($city, $trans)))
);
echo "\n";
}
输出:
'Jakarta Selatan'
'Selatan'
'What'
'Coca Cola'
我有一系列已列入黑名单的字词:
$arrayBlacklist = array("Kota","Kab.","Kota Administrasi","KAB", "KOTA", "Kabupaten");
我有一个字符串需要清理:
$city = "Kota Jakarta Selatan";
// also: "Kab. Jakarta Selatan", "Kota Administrasi Jakarta Selatan", ...
我只想删除 $arrayBlacklist
值(如果它在 $city
变量中)。
所以,我得到 $city = "Jakarta Selatan"
$arrayBlacklist = array("Kota Administrasi", "Kota","Kab.","KAB", "KOTA", "Kabupaten");
rsort($arrayBlacklist);
$city = "Kota Jakarta Selatan";
$city = trim(preg_replace('/\s+/', ' ',str_replace($arrayBlacklist, '', $city)));
你可以使用 https://www.php.net/manual/en/function.str-replace.php
str_replace 可以使用数组作为搜索和替换语句。
不如其他答案优雅,但完成了工作。
$arrayBlacklist = ['Kota', 'Kab.', 'Kota Administrasi', 'KAB', 'KOTA', 'Kabupaten'];
$city = 'Kota Jakarta Selatan';
// make an array of words from the city name
$cityAsArray = explode(' ', $city);
foreach ($cityAsArray as $key => $part) {
// check if word is in blacklist
if (in_array($part, $arrayBlacklist)) {
// remove from the array if it is blacklisted
unset($cityAsArray[$key]);
}
}
// convert the city name back to string
$city = implode(' ', $cityAsArray);
更新: 我们可以根据单词数对黑名单数组进行排序,然后用字符串替换每个列入黑名单的字符串。
$arrayBlacklist = ["Kota", "Kab.", "Kota Administrasi", "KAB", "KOTA", "Kabupaten"];
$city = 'Kota Administrasi Jakarta Selatan';
usort($arrayBlacklist, function ($a, $b) {
return substr_count($a, ' ') < substr_count($b, ' ');
});
foreach ($arrayBlacklist as $blacklist) {
$city = trim(str_replace($blacklist, '', $city));
}
- 使用
usort
. 根据字符串长度对数组进行排序以避免重叠问题
preg_replace
每个字符串 不区分大小写 方式。- 最后,使用
str_replace
. 删除所有带有单个 space 的双 space
片段:
<?php
$arrayBlacklist = array("Kota","Kab.","Kota Administrasi","KAB", "KOTA", "Kabupaten","Jakarta");
usort($arrayBlacklist,function($a,$b){
return strlen($b) <=> strlen($a);
});
$city = "Kota Jakarta Selatan kota Administrasi ki";
$city = " ". $city. " "; // add spaces to ease the matching
foreach($arrayBlacklist as $val){
$city = preg_replace('/\s'.$val.'\s/i',' ',$city); // replace with double spaces to avoid recursive matching
}
$city = str_replace(" "," ",trim($city));
echo $city;
更新:
preg_replace
将左右两侧的字符串匹配为被 space 覆盖的字符串,因为您有时在列入黑名单的字符串中也有非单词字符。为了简化匹配,我们特意在循环开始前添加前导和尾随 spaces。
注意:我们将preg_replace
中匹配到的字符串替换为双space,避免与其他字符串进行递归匹配。
我认为 strtr()
是这项工作的最佳工具,因为:
- 您不需要对黑名单数组进行预排序
- 替换时较长的匹配项将取代较短的匹配项。
如此有效,您“翻译”,然后 trim leading/trailing 个空格,然后删除所有内部冗余空格。
代码:(Demo)
$arrayBlacklist = ["Kota Administrasi", "Kota","Kab.","KAB", "KOTA", "Kabupaten"];
$trans = array_fill_keys($arrayBlacklist, '');
$cities = [
"Kota Jakarta Selatan",
"Kota Administrasi Selatan",
"Kab. What Kota Kab.",
"KOTA Kota Coca Cola",
];
foreach ($cities as $city) {
var_export(
preg_replace('/\s{2,}/', ' ', trim(strtr($city, $trans)))
);
echo "\n";
}
输出:
'Jakarta Selatan'
'Selatan'
'What'
'Coca Cola'