如何在字符串中查找多个子字符串匹配项，更改子字符串外壳

Question

我正在尝试用 ruby 解析一个 HTML 的字符串，这个字符串包含多个 <pre></pre> 标签，我需要找到并编码所有 < 和 > 每个元素之间的括号。

Example: 

string_1_pre = "<pre><h1>Welcome</h1></pre>"

string_2_pre = "<pre><h1>Welcome</h1></pre><pre><h1>Goodbye</h1></pre>"

def clean_pre_code(html_string)
 matched = html_string.match(/(?<=<pre>).*(?=<\/pre>)/)
 cleaned = matched.to_s.gsub(/[<]/, "&lt;").gsub(/[>]/, "&gt;")
 html_string.gsub(/(?<=<pre>).*(?=<\/pre>)/, cleaned)
end

clean_pre_code(string_1_pre) #=> "<pre>&lt;h1&gt;Welcome&lt;/h1&gt;</pre>"
clean_pre_code(string_2_pre) #=> "<pre>&lt;h1&gt;Welcome&lt;/h1&gt;&lt;/pre&gt;&lt;pre&gt;&lt;h1&gt;Goodbye&lt;/h1&gt;</pre>"

只要 html_string 只包含一个 <pre></pre> 元素，这就有效，但如果有多个，则无效。

我愿意接受使用 Nokogiri 或类似工具的解决方案，但不知道如何让它做我想做的事。

如果您需要任何其他上下文，请告诉我。

更新：这仅适用于 Nokogiri，请参阅已接受的答案。

Answer 1

@zstrad44 是的，您可以使用 Nokogiri 完成它。这是我从您的版本开发的代码版本，这将为您提供字符串中多个 pre 标签所需的结果。

def clean_pre_code(html_string)
  doc = Nokogiri::HTML(html_string)
  all_pre = doc.xpath('//pre')
  res = ""
  all_pre.each do |pre|
    pre = pre.to_html
    matched = pre.match(/(?<=<pre>).*(?=<\/pre>)/)
    cleaned = matched.to_s.gsub(/[<]/, "&lt;").gsub(/[>]/, "&gt;")
    res += pre.gsub(/(?<=<pre>).*(?=<\/pre>)/, cleaned)
  end
  res
end

我建议您阅读 Nokogiri Cheatsheet 以更好地理解我在代码中使用的方法。编码愉快！希望我能帮上忙

如何在字符串中查找多个子字符串匹配项，更改子字符串外壳

How to find multiple substring matches within a string, alter substring enclosures

ruby

regex

html-encode

ruby-on-rails

nokogiri