字符串中的前 10 个关键字 PHP

Top 10 keywords PHP in a string

当我的目标是在字符串中显示前 10 个词时,我制作了一个复杂的关键字数组。

b) 我只想介绍一个重要的词而不是像 "The,That,to,a...".

这样的词

完整代码:

$str= $db_tag;
    $tok = strtok($str, ", ");
    $subStrStart = 0;

    while ($tok !== false) {
        preg_match_all("/\b" . preg_quote($tok, "/") . "\b/", substr($str, $subStrStart), $m);
        if(count($m[0]) >= 10)
            echo "'" . $tok . "' found more than 10 times, exaclty: " . count($m[0]) . "<br>";
        $subStrStart += strlen($tok);
        $tok = strtok(", ");
    }    

我的字符串:

$db_tag="The,Economy,Could,Be,Given,A,Post,Brexit,Vote,Vote,Vote,Vote,Boost,This,Week,As,Expectations,Mount,That,The,Bank,Bank,Bank,Bank,Bank,Of,England,England,England,England,England,Will,Cut,Economy,Economy,Economy,Brexit,Brexit,Brexit,Brexit";

提前致谢。

如果 "Top 10" 你的意思是字符串中的“10 个最常用的单词”,用逗号分隔 ,,你可以这样做:

$string = "The,Economy,Could,Be,Given,A,Post,Brexit,Vote,Vote,Vote,Vote,Boost,This,Week,As,Expectations,Mount,That,The,Bank,Bank,Bank,Bank,Bank,Of,England,England,England,England,England,Will,Cut,Economy,Economy,Economy,Brexit,Brexit,Brexit,Brexit";

//Create array of words split by ","
$words = explode(",",$string);

//Create an empty array to hold data
$wordData = [];

foreach($words as $word){
    //Convert to lower case (for uniformity)
    $word = strtolower($word);

    //Add to an array if doesn't exist; if it does,
    //add to the number
    if(isset($wordData[$word])){
        $wordData[$word]++;
    } else $wordData[$word] = 1;
}

//Order $wordData array by number
arsort($wordData);

print_r($wordData);

这将输出:

Array ( [England] => 5 [Bank] => 5 [Brexit] => 5 [Vote] => 4 [Economy] => 4 [The] => 2 [Expectations] => 1 [Will] => 1 [Of] => 1 [That] => 1 [Mount] => 1 [This] => 1 [As] => 1 [Week] => 1 [Boost] => 1 [Post] => 1 [A] => 1 [Given] => 1 [Be] => 1 [Could] => 1 [Cut] => 1 )


要过滤掉特定的词:

//Establish array of words to filter
$filterWords = ["the", "is", "are", "of", "that"];

//Remove those words from the array created earlier
foreach($filterWords as $fw){
    if(isset($wordData[$fw])) unset($wordData[$fw]);
}

print_r($wordData);

这将输出:

Array ( [england] => 5 [bank] => 5 [brexit] => 5 [vote] => 4 [economy] => 4 [expectations] => 1 [will] => 1 [mount] => 1 [this] => 1 [as] => 1 [week] => 1 [boost] => 1 [post] => 1 [a] => 1 [given] => 1 [be] => 1 [could] => 1 [cut] => 1 )

试试这个:

$db_tag = "The,Economy,Could,Be,Given,A,Post,Brexit,Vote,Vote,Vote,Vote,Boost,This,Week,As,Expectations,Mount,That,The,Bank,Bank,Bank,Bank,Bank,Of,England,England,England,England,England,Will,Cut,Economy,Economy,Economy,Brexit,Brexit,Brexit,Brexit";

$stopWords = array(
    "the", "to", "in", "a", "of", "is", "that", "will", "and", "be"
);

// Convert to array and filter out stopwords.
$words = array_filter(function ($value) {
    return !in_array($value, $stopwords);
}, explode(',', $db_tag));

$counts = array_count_values($words);
asort($counts);
$topTen = array_reverse(array_slice($counts, -10, null, true));

var_dump($topTen);

你应该看到:

php > var_dump($topTen);
array(10) {
  ["England"]=>
  int(5)
  ["Bank"]=>
  int(5)
  ["Brexit"]=>
  int(5)
  ["Economy"]=>
  int(4)
  ["Vote"]=>
  int(4)
  ["The"]=>
  int(2)
  ["Post"]=>
  int(1)
  ["Given"]=>
  int(1)
  ["A"]=>
  int(1)
  ["Could"]=>
  int(1)
}

首先,我们将字符串拆分为一个数组 explode(). Then, we return an array of unique array values with array_count_values(),与它们在字符串中出现的次数相关联。

接下来,我们使用 asort(). Then, we slice off the last 10 elements from the array (the highest ones) with array_slice() and then reverse it with array_reverse() 按值对数组进行就地排序,以将它们按降序排列(可选)。

您可以使用爆炸和数组:

$db_tag="The,Economy,Could,Be,Given,A,Post,Brexit,Vote,Vote,Vote,Vote,Boost,This,Week,As,Expectations,Mount,That,The,Bank,Bank,Bank,Bank,Bank,Of,England,England,England,England,England,Will,Cut,Economy,Economy,Economy,Brexit,Brexit,Brexit,Brexit";
$array = array();
foreach (explode(',', $db_tag) as $val) 
{
    if(!isset($array[$val]))
    {
        $array[$val] = 1;
    }
    else
    {
        $array[$val]++;
    }
}
arsort($array);
print_r($array);

将输出:

Array
(
    [England] => 5
    [Bank] => 5
    [Brexit] => 5
    [Vote] => 4
    [Economy] => 4
    [The] => 2
    [Expectations] => 1
    [Will] => 1
    [Of] => 1
    [That] => 1
    [Mount] => 1
    [This] => 1
    [As] => 1
    [Week] => 1
    [Boost] => 1
    [Post] => 1
    [A] => 1
    [Given] => 1
    [Be] => 1
    [Could] => 1
    [Cut] => 1
)

使用以下函数从字符串中提取搜索关键词

function getKeywords($string)
{
    $string = "North Korea has recently introduced a sweeping new law which seeks to stamp out any kind of foreign influence - harshly punishing anyone caught with foreign films, clothing or even using slang. But why?Yoon Mi-so says she was 11 when she first saw a man executed for being caught with a South Korean drama.    His entire neighbourhood was ordered to watch. If you didn't, it would be classed as treason, she told the BBC from her home in Seoul.        The North Korean guards were making sure everyone knew the penalty for smuggling illicit videos was death. I have a strong memory of the man who was blindfolded, I can still see his tears flow down. That was traumatic for me. The blindfold was completely drenched in his tears. ";
    $vowels = ["a","e","i","o","u"];
    $ignore = ["th","thy","sh"];
    $string = str_replace($vowels, "", $string);

//Create array of words split by ","
$words = explode(" ",$string);

//Create an empty array to hold data
$wordData = [];

foreach($words as $word){
    //Convert to lower case (for uniformity)
    $word = trim(strtolower($word));
    if(strlen($word)<3)
        continue;
    if(array_search($word, $ignore)>-1) continue;
    //Add to an array if doesn't exist; if it does,
    //add to the number
    if(isset($wordData[$word])){
        $wordData[$word]++;
    } else $wordData[$word] = 1;
}

//Order $wordData array by number
arsort($wordData);

$x = (array_keys($wordData));
$result = "";
$count = 0;

foreach ($wordData as $key => $value) {
    $count++;
    $result .=$key . ",";
    if($count==10) break;
}

return $result;
}