从段落中找到匹配词的最有效方法
Most efficient way to find matching words from paragraph
我有一个段落,我必须针对不同的关键字进行解析。例如,段落:
"I want to make a change in the world. Want to make it a better place to live. Peace, Love and Harmony. It is all life is all about. We can make our world a good place to live"
我的关键词是
"world"、"earth"、"place"
我应该在比赛的时候报告,报告多少次。
输出应该是:
"world" 2 次和 "place" 1 次
目前,我只是将段落字符串转换为字符数组,然后将每个关键字与所有数组内容进行匹配。
这是在浪费我的资源。
请指导我一个有效的方法。(我正在使用PHP)
我会用preg_match_all()
。这是它在您的代码中的样子。实际函数 returns 找到的项目数,但 $matches 数组将保存结果:
<?php
$string = "world";
$paragraph = "I want to make a change in the world. Want to make it a better place to live. Peace, Love and Harmony. It is all life is all about. We can make our world a good place to live";
if (preg_match_all($string, $paragraph, &$matches)) {
echo 'world'.count($matches[0]) . "times";
}else {
echo "match NOT found";
}
?>
<?php
Function woohoo($terms, $para) {
$result ="";
foreach ($terms as $keyword) {
$cnt = substr_count($para, $keyword);
if ($cnt) {
$result .= $keyword. " found ".$cnt." times<br>";
}
}
return $result;
}
$terms = array('world', 'earth', 'place');
$para = "I want to make a change in the world. Want to make it a better place to live.";
$r = woohoo($terms, $para);
echo($r);
?>
正如@CasimiretHippolyte 评论的那样,正则表达式是更好的方法,因为 word boundaries can be used. Further caseless matching is possible using the i
flag. Use with preg_match_all return 值:
Returns the number of full pattern matches (which might be zero), or FALSE if an error occurred.
匹配一个词的模式是:/\bword\b/i
。生成一个数组,其中键是来自搜索 $words
的词值,值是映射的词数,即 preg_match_all returns:
$words = array("earth", "world", "place", "foo");
$str = "at Earth Hour the world-lights go out and make every place on the world dark";
$res = array_combine($words, array_map( function($w) USE (&$str) { return
preg_match_all('/\b'.preg_quote($w,'/').'\b/i', $str); }, $words));
print_r($res);
test at eval.in 输出到:
Array
(
[earth] => 1
[world] => 2
[place] => 1
[foo] => 0
)
使用了preg_quote来转义不必要的单词,如果你知道的话,它们不包含任何特殊的。对于使用 array_combine
PHP 的内联匿名函数,需要 5.3.
我有一个段落,我必须针对不同的关键字进行解析。例如,段落:
"I want to make a change in the world. Want to make it a better place to live. Peace, Love and Harmony. It is all life is all about. We can make our world a good place to live"
我的关键词是
"world"、"earth"、"place"
我应该在比赛的时候报告,报告多少次。
输出应该是:
"world" 2 次和 "place" 1 次
目前,我只是将段落字符串转换为字符数组,然后将每个关键字与所有数组内容进行匹配。 这是在浪费我的资源。 请指导我一个有效的方法。(我正在使用PHP)
我会用preg_match_all()
。这是它在您的代码中的样子。实际函数 returns 找到的项目数,但 $matches 数组将保存结果:
<?php
$string = "world";
$paragraph = "I want to make a change in the world. Want to make it a better place to live. Peace, Love and Harmony. It is all life is all about. We can make our world a good place to live";
if (preg_match_all($string, $paragraph, &$matches)) {
echo 'world'.count($matches[0]) . "times";
}else {
echo "match NOT found";
}
?>
<?php
Function woohoo($terms, $para) {
$result ="";
foreach ($terms as $keyword) {
$cnt = substr_count($para, $keyword);
if ($cnt) {
$result .= $keyword. " found ".$cnt." times<br>";
}
}
return $result;
}
$terms = array('world', 'earth', 'place');
$para = "I want to make a change in the world. Want to make it a better place to live.";
$r = woohoo($terms, $para);
echo($r);
?>
正如@CasimiretHippolyte 评论的那样,正则表达式是更好的方法,因为 word boundaries can be used. Further caseless matching is possible using the i
flag. Use with preg_match_all return 值:
Returns the number of full pattern matches (which might be zero), or FALSE if an error occurred.
匹配一个词的模式是:/\bword\b/i
。生成一个数组,其中键是来自搜索 $words
的词值,值是映射的词数,即 preg_match_all returns:
$words = array("earth", "world", "place", "foo");
$str = "at Earth Hour the world-lights go out and make every place on the world dark";
$res = array_combine($words, array_map( function($w) USE (&$str) { return
preg_match_all('/\b'.preg_quote($w,'/').'\b/i', $str); }, $words));
print_r($res);
test at eval.in 输出到:
Array ( [earth] => 1 [world] => 2 [place] => 1 [foo] => 0 )
使用了preg_quote来转义不必要的单词,如果你知道的话,它们不包含任何特殊的。对于使用 array_combine
PHP 的内联匿名函数,需要 5.3.