在 perl 中检查 $string 是否以 $needle 开头的最有效方法

Question

给定 perl 中的两个字符串变量 $string 和 $needle，检查 $string 是否以 $needle 开头的最有效方法是什么。

$string =~ /^\Q$needle\E/ 是我能想到的最接近的匹配，它满足了要求，但在我尝试过的解决方案中效率最低（到目前为止）。
index($string, $needle) == 0 有效并且对于 $string 和 $needle 的某些值相对有效但不必要地在其他位置搜索针（如果在开始时未找到）。
substr($string, 0, length($needle)) eq $needle应该是相当简单和高效的，但在我的几次测试中，大多数并不比前一个更有效。

在 perl 中是否有我不知道的规范方法或任何优化上述任何解决方案的方法？

（在我的特定用例中，$string 和 $needle 在每个运行中都会有所不同，因此预编译正则表达式不是一个选项）。

如何衡量给定解决方案性能的示例（此处来自 POSIX sh）：

string='somewhat not so longish string' needle='somew'
time perl -e '
  ($n,$string,$needle) = @ARGV;
  for ($i=0;$i<$n;$i++) {

    index($string, $needle) == 0

  }' 10000000 "$string" "$needle"

使用这些值，index() 比 substr()+eq 与此系统以及 perl 5.14.2 的性能更好，但是：

string="aaaaabaaaaabaaaaabaaaaabaaaaabaaaaab" needle="aaaaaa"

这是相反的。

Answer 1

这真的有多重要？我做了一些基准测试，index 方法平均每次迭代 0.68 微秒；正则表达式方法 1.14μs； substr 方法 0.16μs。即使是我最坏的情况（2250 个字符的字符串相等），index 花费了 2.4 微秒，正则表达式花费了 5.7 微秒，而 substr 花费了 0.5 微秒。

我的建议是写一个库例程：

sub begins_with
{
    return substr($_[0], 0, length($_[1])) eq $_[1];
}

并将您的优化工作集中在其他地方。

更新：基于对我上面描述的 "worst-case" 场景的批评，我运行一组新的基准测试有 20,000 个字符运行domly 生成的字符串，将其与本身和仅在最后一个字节不同的字符串。

对于这么长的字符串，正则表达式解决方案是迄今为止最糟糕的（20,000 个字符的正则表达式是地狱）：匹配成功需要 105 微秒，匹配失败需要 100 微秒。

index 和 substr 解决方案仍然非常快。对于 success/failure，index 为 11.83μs / 11.86μs，而 substr 为 4.09μs / 4.15μs。将代码移动到一个单独的函数中，增加了大约 0.222±0.05μs。

基准代码位于：http://codepaste.net/2k1y8e

我不知道@Stephane 的数据有什么特点，但我的建议是有效的。

Answer 2

rindex $string, $substring, 0

在 $string 位置 <=0 中搜索 $substring 只有当 $substring 是 [=13] 的前缀时才有可能=].示例：

> rindex "abc", "a", 0
0
> rindex "abc", "b", 0
-1

在 perl 中检查 $string 是否以 $needle 开头的最有效方法

Most efficient way to check if $string starts with $needle in perl

perl

performance

string-matching