在 preg_replace() 之前使用 strpos() 更快吗?
Is using strpos() before preg_replace() faster?
假设我们在数百万 post 个字符串上使用这个 preg_replace
:
function makeClickableLinks($s) {
return preg_replace('@(https?://([-\w\.]+[-\w])+(:\d+)?(/([\w/_\.#-]*(\?\S+)?[^\.\s])?)?)@', '<a href="" target="_blank"></a>', $s);
}
假设所有 post 中只有 10% 包含链接,在调用 preg_replace()
之前检查 strpos($string, 'http') !== false
会更快吗?如果是这样,为什么? preg_replace()
不在内部进行一些预测试吗?
是的,使用像 strpos()
这样的简单搜索比编译和执行正则表达式要快得多,这是在替换本身必须发生的内存复制之上。如果您正在做数百或数千,那么没有意义,但如果您正在做数百万(尤其是如果其中只有 10% 包含 http),那么首先进行简单搜索将变得值得。
最终,要 100% 确定的唯一方法是对其进行基准测试,但我相当确定您将首先使用 strpos()
获得一些改进。
令人惊讶的是,是的!
以下是您使用这两个函数分析 10,000,000 个字符串的基准:
测试 1 - 匹配模式的字符串:
"Here is a great new site to visit at http://example.com so go there now!"
preg_replace alone took 10.9626309872 seconds
strpos before preg_replace took 12.6124269962 seconds ← slower
测试 2 - 与模式不匹配的字符串:
"Here is a great new site to visit at ftp://example.com so go there now!"
preg_replace alone took 6.51636195183 seconds
strpos before preg_replace took 2.91205692291 seconds ← faster
测试 3 - 10% 的字符串匹配模式:
"Here is a great new site to visit at ftp://example.com so go there now!" (90%)
"Here is a great new site to visit at http://example.com so go there now!" (10%)
preg_replace alone took 7.43295097351 seconds
strpos before preg_replace took 4.31978201866 seconds ← faster
这只是两个字符串的简单基准测试,但速度有明显差异。
这是“10%”案例的测试工具:
<?php
$string1 = "Here is a great new site to visit at http://example.com so go there now!";
$string2 = "Here is a great new site to visit at ftp://example.com so go there now!";
function makeClickableLinks1($s) {
return preg_replace('@(https?://([-\w\.]+[-\w])+(:\d+)?(/([\w/_\.#-]*(\?\S+)?[^\.\s])?)?)@', '<a href="" target="_blank"></a>', $s);
}
function makeClickableLinks2($s) {
return strpos($s, 'http') !== false ? preg_replace('@(https?://([-\w\.]+[-\w])+(:\d+)?(/([\w/_\.#-]*(\?\S+)?[^\.\s])?)?)@', '<a href="" target="_blank"></a>', $s) : null;
}
/* Begin test harness */
$loops = 10000000;
function microtime_float() {
list($usec, $sec) = explode(" ", microtime());
return ((float)$usec + (float)$sec);
}
/* Test using only preg_replace */
$time_start = microtime_float();
for($i = 0; $i < $loops; $i++) {
// Only 10% of strings will have "http"
makeClickableLinks1($i % 10 ? $string2 : $string1);
}
$time_end = microtime_float();
$time = $time_end - $time_start;
echo "preg_replace alone took $time seconds<br/>";
/* Test using strpos before preg_replace */
$time_start = microtime_float();
for($i = 0; $i < $loops; $i++) {
// Only 10% of strings will have "http"
makeClickableLinks2($i % 10 ? $string2 : $string1);
}
$time_end = microtime_float();
$time = $time_end - $time_start;
echo "strpos before preg_replace took $time seconds<br/>";
?>
假设我们在数百万 post 个字符串上使用这个 preg_replace
:
function makeClickableLinks($s) {
return preg_replace('@(https?://([-\w\.]+[-\w])+(:\d+)?(/([\w/_\.#-]*(\?\S+)?[^\.\s])?)?)@', '<a href="" target="_blank"></a>', $s);
}
假设所有 post 中只有 10% 包含链接,在调用 preg_replace()
之前检查 strpos($string, 'http') !== false
会更快吗?如果是这样,为什么? preg_replace()
不在内部进行一些预测试吗?
是的,使用像 strpos()
这样的简单搜索比编译和执行正则表达式要快得多,这是在替换本身必须发生的内存复制之上。如果您正在做数百或数千,那么没有意义,但如果您正在做数百万(尤其是如果其中只有 10% 包含 http),那么首先进行简单搜索将变得值得。
最终,要 100% 确定的唯一方法是对其进行基准测试,但我相当确定您将首先使用 strpos()
获得一些改进。
令人惊讶的是,是的!
以下是您使用这两个函数分析 10,000,000 个字符串的基准:
测试 1 - 匹配模式的字符串:
"Here is a great new site to visit at http://example.com so go there now!"
preg_replace alone took 10.9626309872 seconds
strpos before preg_replace took 12.6124269962 seconds ← slower
测试 2 - 与模式不匹配的字符串:
"Here is a great new site to visit at ftp://example.com so go there now!"
preg_replace alone took 6.51636195183 seconds
strpos before preg_replace took 2.91205692291 seconds ← faster
测试 3 - 10% 的字符串匹配模式:
"Here is a great new site to visit at ftp://example.com so go there now!" (90%)
"Here is a great new site to visit at http://example.com so go there now!" (10%)
preg_replace alone took 7.43295097351 seconds
strpos before preg_replace took 4.31978201866 seconds ← faster
这只是两个字符串的简单基准测试,但速度有明显差异。
这是“10%”案例的测试工具:
<?php
$string1 = "Here is a great new site to visit at http://example.com so go there now!";
$string2 = "Here is a great new site to visit at ftp://example.com so go there now!";
function makeClickableLinks1($s) {
return preg_replace('@(https?://([-\w\.]+[-\w])+(:\d+)?(/([\w/_\.#-]*(\?\S+)?[^\.\s])?)?)@', '<a href="" target="_blank"></a>', $s);
}
function makeClickableLinks2($s) {
return strpos($s, 'http') !== false ? preg_replace('@(https?://([-\w\.]+[-\w])+(:\d+)?(/([\w/_\.#-]*(\?\S+)?[^\.\s])?)?)@', '<a href="" target="_blank"></a>', $s) : null;
}
/* Begin test harness */
$loops = 10000000;
function microtime_float() {
list($usec, $sec) = explode(" ", microtime());
return ((float)$usec + (float)$sec);
}
/* Test using only preg_replace */
$time_start = microtime_float();
for($i = 0; $i < $loops; $i++) {
// Only 10% of strings will have "http"
makeClickableLinks1($i % 10 ? $string2 : $string1);
}
$time_end = microtime_float();
$time = $time_end - $time_start;
echo "preg_replace alone took $time seconds<br/>";
/* Test using strpos before preg_replace */
$time_start = microtime_float();
for($i = 0; $i < $loops; $i++) {
// Only 10% of strings will have "http"
makeClickableLinks2($i % 10 ? $string2 : $string1);
}
$time_end = microtime_float();
$time = $time_end - $time_start;
echo "strpos before preg_replace took $time seconds<br/>";
?>