从字符串中删除列入黑名单的术语,然后删除不必要的空格

Remove blacklisted terms from string then eliminate unnecessary spaces

我有一系列已列入黑名单的字词:

$arrayBlacklist = array("Kota","Kab.","Kota Administrasi","KAB", "KOTA", "Kabupaten");

我有一个字符串需要清理:

$city = "Kota Jakarta Selatan";
// also: "Kab. Jakarta Selatan", "Kota Administrasi Jakarta Selatan", ...

我只想删除 $arrayBlacklist 值(如果它在 $city 变量中)。

所以,我得到 $city = "Jakarta Selatan"

$arrayBlacklist = array("Kota Administrasi", "Kota","Kab.","KAB", "KOTA", "Kabupaten");
rsort($arrayBlacklist);
$city = "Kota Jakarta Selatan";
        
$city = trim(preg_replace('/\s+/', ' ',str_replace($arrayBlacklist, '', $city)));

你可以使用 https://www.php.net/manual/en/function.str-replace.php

str_replace 可以使用数组作为搜索和替换语句。

不如其他答案优雅,但完成了工作。

$arrayBlacklist = ['Kota', 'Kab.', 'Kota Administrasi', 'KAB', 'KOTA', 'Kabupaten'];
$city = 'Kota Jakarta Selatan'; 

// make an array of words from the city name
$cityAsArray = explode(' ', $city);

foreach ($cityAsArray as $key => $part) {
    // check if word is in blacklist
    if (in_array($part, $arrayBlacklist)) {
        // remove from the array if it is blacklisted
        unset($cityAsArray[$key]);
    }
}

// convert the city name back to string
$city = implode(' ', $cityAsArray);

更新: 我们可以根据单词数对黑名单数组进行排序,然后用字符串替换每个列入黑名单的字符串。

$arrayBlacklist = ["Kota", "Kab.", "Kota Administrasi", "KAB", "KOTA", "Kabupaten"];
$city = 'Kota Administrasi Jakarta Selatan';
usort($arrayBlacklist, function ($a, $b) {
    return substr_count($a, ' ') < substr_count($b, ' ');
});

foreach ($arrayBlacklist as $blacklist) {
    $city = trim(str_replace($blacklist, '', $city));
}
  • 使用 usort.
  • 根据字符串长度对数组进行排序以避免重叠问题
  • preg_replace 每个字符串 不区分大小写 方式。
  • 最后,使用 str_replace.
  • 删除所有带有单个 space 的双 space

片段:

<?php

$arrayBlacklist = array("Kota","Kab.","Kota Administrasi","KAB", "KOTA", "Kabupaten","Jakarta");

usort($arrayBlacklist,function($a,$b){
    return strlen($b) <=> strlen($a);
});


$city = "Kota Jakarta Selatan kota Administrasi ki";
$city = " ". $city. " "; // add spaces to ease the matching

foreach($arrayBlacklist as $val){
   $city = preg_replace('/\s'.$val.'\s/i','  ',$city); // replace with double spaces to avoid recursive matching
}

$city = str_replace("  "," ",trim($city));
echo $city;

更新:

preg_replace 将左右两侧的字符串匹配为被 space 覆盖的字符串,因为您有时在列入黑名单的字符串中也有非单词字符。为了简化匹配,我们特意在循环开始前添加前导和尾随 spaces。

注意:我们将preg_replace中匹配到的字符串替换为双space,避免与其他字符串进行递归匹配。

我认为 strtr() 是这项工作的最佳工具,因为:

  1. 您不需要对黑名单数组进行预排序
  2. 替换时较长的匹配项将取代较短的匹配项。

如此有效,您“翻译”,然后 trim leading/trailing 个空格,然后删除所有内部冗余空格。

代码:(Demo)

$arrayBlacklist = ["Kota Administrasi", "Kota","Kab.","KAB", "KOTA", "Kabupaten"];
$trans = array_fill_keys($arrayBlacklist, '');

$cities = [
    "Kota Jakarta Selatan",
    "Kota Administrasi Selatan",
    "Kab. What Kota Kab.",
    "KOTA Kota Coca Cola",
];
        
foreach ($cities as $city) {
    var_export(
        preg_replace('/\s{2,}/', ' ', trim(strtr($city, $trans)))
    );
    echo "\n";
}

输出:

'Jakarta Selatan'
'Selatan'
'What'
'Coca Cola'