str_pos 和 preg_match 哪个效率更高?

Which is more efficient between str_pos and preg_match?

在这个问题之后:

我知道我的模式每个周期只能包含一个单词,因为在那个问题中报告的情况下,我必须找到 "microsoft" 和 "microsoft exchange" 并且我无法修改我的正则表达式,因为这两种可能性是从数据库中动态给出的!

所以我的问题是:在 200 多个 preg_match 和相同数量的 str_pos 之间,哪个是更好的解决方案来检查 char 的子集是否包含这些词?

我正在尝试为这两种解决方案编写可能的代码:

$array= array(200+ values);
foreach ($array as $word)
{
    $pattern='<\b(?:'.$word.')\b>i';
    preg_match_all($pattern, $text, $matches);
    $fields['skill'][] = $matches[0][0];
}

备选方案是:

$array= array(200+ values);
foreach ($array as $word)
{
    if(str_pos($word, $text)>-1)
    {
    fields['skill'][] = $word;
    }
}

REGEX based functions are slowers than most other string functions.

顺便说一下,如果您像 $pattern='<\b(?:'.$word1.'|'.$word2.'|'.$word3.'|'.$word4.')‌​\b>i'; 那样做的话,您的测试也可以使用一个正则表达式来完成,一次可以使用多少个单词取决于正则表达式的长度。我在测试时创建了 12004 个字符长的正则表达式。好像不是最大

正则表达式版本(单次调用):

$array= array(200+ values);

$pattern='<\b(?:'.implode('|',$array).')\b>i';
preg_match_all($pattern, $text, $matches);
//$fields['skill'][] = $matches[0][0]; 

strpos 版本(多次调用)

$array= array(200+ values);
foreach ($array as $word){
 if(strpos($word, $text)!==false)//not with >-1 wont work.
 {
   fields['skill'][] = $word;
 }
}

如果您要查找单个单词,strpos 将匹配 HelloWorld 中的 Hello, 所以如果你只想要真正的分离词,你可以这样做:

$arrayOfWords = explode(' ',$string);
//and now you can check array aginst array 
$array= array(200+ values);
foreach ($array as $word){
 if(in_array($word,$arrayOfWords))//not with >-1 wont work.
 {
   fields['skill'][] = $word;
 }
}
//you can makes this also faster if you array_flip the arrayOfWords 
//and then check with 'isset' (more faster than 'in_array')

如果您的单词列表中没有此组合,则您还​​想匹配单词组合 ("microsoft exchange") 无法通过这种方式完成。

*已添加评论

strpospreg_match 快得多,这里是一个基准:

$array = array();
for($i=0; $i<1000; $i++) $array[] = $i;
$nbloop = 10000;
$text = <<<EOD
I understand that my pattern must contain only a word per cycle because, in the case reported in that question, I must find "microsoft" and "microsoft exchange" and I can't modify my regexp because these two possibilities are given dinamically from a database!

So my question is: which is the better solution between over 200 preg_match and the same numbers of str_pos to check if a subset of char contains these words?
EOD;

$start = microtime(true);
for ($i=0; $i<$nbloop; $i++) {
    foreach ($array as $word) {
        $pattern='<\b(?:'.$word.')\b>i';
        if (preg_match_all($pattern, $text, $matches)) {
            $fields['skill'][] = $matches[0][0];
        }
    }
}
echo "Elapse regex: ", microtime(true)-$start,"\n";


$start = microtime(true);
for ($i=0; $i<$nbloop; $i++) {
    foreach ($array as $word) {
        if(strpos($word, $text)>-1) {
            $fields['skill'][] = $word;
        }
    }
}
echo "Elapse strpos: ", microtime(true)-$start,"\n";

输出:

Elapse regex: 7.9924139976501
Elapse strpos: 0.62015008926392

大约快了 13 倍。