为什么使用 ruby 查找具有所需文本的节点比使用 xpath 更快？

Question

最近我不得不检查 html 个节点是否包含所需的文本。令我惊讶的是，当我重构代码以使用 xpath 选择器时，它变得慢了 10 倍。有原始代码的简化版本与基准测试

# has_keyword_benchmark.rb
require 'benchmark'
require 'nokogiri'

Doc = Nokogiri("
<div>
  <div>
    A
  </div>
  <p>
    <b>A</b>
  </p>
  <span>
    B
  </span>
</div>")

def has_keywords_with_xpath
  Doc.xpath('./*[contains(., "A")]').size > 0
end

def has_keywords_with_ruby
  Doc.text.include? 'A'
end

iterations = 10_000
Benchmark.bm(27) do |bm|
  bm.report('checking if has keywords with xpath') do
    iterations.times do
      has_keywords_with_xpath
    end
  end

  bm.report('checking if has keywords with ruby') do
    iterations.times do
      has_keywords_with_ruby
    end
  end
end

当我运行 ruby has_keyword_benchmark.rb 我得到

                                  user     system      total        real
checking if has keywords with xpath  0.400000   0.020000   0.420000 (  0.428484)
checking if has keywords with ruby  0.020000   0.000000   0.020000 (  0.023773)

直观地检查节点是否有一些文本应该使用 xpath 更快，但事实并非如此。有人知道为什么吗？

Answer 1

通常，XPath 表达式的解析和编译比实际执行它所花的时间要长得多，即使是在相当大的文档上也是如此。例如，对于 Saxon，运行表达式 count(//*[contains(., 'e')]) 针对 1Mb 源文档，编译路径表达式需要 200 毫秒，而执行它大约需要 18 毫秒。

如果您的 XPath API 允许您编译一次 XPath 表达式然后重复执行它（或者如果它在幕后缓存已编译的表达式）那么绝对值得利用该功能。

实际的 XPath 执行可能至少与您手写的导航代码一样快，甚至可能更快。导致开销的是准备工作。

为什么使用 ruby 查找具有所需文本的节点比使用 xpath 更快？

Why finding node with desired text is faster with ruby than with xpath?

ruby

benchmarking

xpath

nokogiri