如何添加商标符号
How to add trademark symbols
我正在尝试向 HTML 文档中 "Imagination Playground" 的所有实例添加商标符号。但是我最终得到这样的结果:
<i class="fa fa-trademark"></i>
我使用的符号似乎已转换为 HTML 个字符。我该如何逃脱?
这是我原来的Ruby代码:
body = "<p>Whether you want to build a playground, make play a priority in your community, or learn more about Imagination Playground , we've got webinars for you in March!</p>
<p>As always, all our webinars are FREE. All you need to participate is a phone and a computer with an Internet connection.</p>"
new_body = Nokogiri::HTML(body)
new_body.encoding = 'UTF-8'
new_body.css('p','a').each{ |p|
p.content = p.content.gsub(/Imagination Playground\s/,'Imagination Playground<i class="fa fa-trademark"></i>');
puts new_body
这就是我得到的:
<p>Whether you want to build a playground, make play a priority in your community, or learn more about Imagination Playground<i class="fa fa-trademark"></i>, we've got webinars for you in March!</p>
<p>As always, all our webinars are FREE. All you need to participate is a phone and a computer with an Internet connection.</p>
如何替换 HTML 段落并转义符号和特殊字符?
这是我的做法:
require 'nokogiri'
doc = Nokogiri::HTML(<<EOT)
<p>Whether you want to build a playground, make play a priority in your community, or learn more about Imagination Playground , we've got webinars for you in March!</p>
<p>As always, all our webinars are FREE. All you need to participate is a phone and a computer with an Internet connection.</p>
EOT
doc.encoding = 'UTF-8'
doc.css('p').each do |p|
p.children = p.content.gsub(/Imagination Playground\s/, 'Imagination Playground<i class="fa fa-trademark"></i>')
end
puts doc
这导致:
# >> <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
# >> <html><body>
# >> <p>Whether you want to build a playground, make play a priority in your community, or learn more about Imagination Playground<i class="fa fa-trademark"></i>, we've got webinars for you in March!</p>
# >> <p>As always, all our webinars are FREE. All you need to participate is a phone and a computer with an Internet connection.</p>
# >> </body></html>
Nokogiri 很聪明。当它看到 children=
时,它会查看是否正在接收字符串。如果是这样,它会解析该字符串并将其转换为一个节点,然后用新节点替换现有的子节点。这与使用 content=
有很大的不同,Nokogiri 知道它应该是文本,然后将嵌入的标签编码为 <
,等等。这在文档中有介绍。
对于children=
:
Set the inner html for this Node node_or_tags node_or_tags can be a Nokogiri::XML::Node, a Nokogiri::XML::DocumentFragment, or a string containing markup.
对于content=
:
Set the Node's content to a Text node containing string. The string gets XML escaped, not interpreted as markup.
this would not work if i want to conserve the html tags that are inside the paragraph, try to do that for <p>fsome test and then <b>bold</b></p>
您正在更改要求。不要那样做。请具体说明您的需求,以便我们一次性回答真正的问题。
需要一个小改动来获取所需标签的内容。使用 children.to_html
获取嵌入节点的 HTML 字符串,然后 gsub
它并使用其结果:
require 'nokogiri'
doc = Nokogiri::HTML('<p>Imagination Playground<b>foo</b></p>')
puts doc.to_html
开始时看起来像这样:
# >> <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
# >> <html><body><p>Imagination Playground<b>foo</b></p></body></html>
修改DOM:
doc.search('p').each do |p|
p.children = p.children.to_html.gsub(/Imagination Playground\s?/, 'Imagination Playground<i class="fa fa-trademark"></i>')
end
puts doc
现在看起来像:
# >> <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
# >> <html><body><p>Imagination Playground<i class="fa fa-trademark"></i><b>foo</b></p></body></html>
注意我使用的是 search
而不是 css
。使用通用方法而不是更具体的方法。如果需要,它可以更轻松地切换到 XPaths。
此外,我在 gsub
中使用了一种更智能的模式,以有条件地获取单个尾随空格(如果可用)。使用 HTML 执行此操作不是必需的,因为浏览器会吞噬空白,但如果您处理的是常规文本文档或预格式化文本,这将是正确的方法。
而且,关于 Nokogiri 看到的更多细节:
doc.search('p').first
# => #(Element:0x3fd222462204 {
# name = "p",
# children = [
# #(Text "Imagination Playground"),
# #(Element:0x3fd2224608f0 { name = "b", children = [ #(Text "foo")] })]
# })
doc.search('p').first.children
# => [#<Nokogiri::XML::Text:0x3fd222461688 "Imagination Playground">, #<Nokogiri::XML::Element:0x3fd2224608f0 name="b" children=[#<Nokogiri::XML::Text:0x3fd22245fe64 "foo">]>]
我正在尝试向 HTML 文档中 "Imagination Playground" 的所有实例添加商标符号。但是我最终得到这样的结果:
<i class="fa fa-trademark"></i>
我使用的符号似乎已转换为 HTML 个字符。我该如何逃脱?
这是我原来的Ruby代码:
body = "<p>Whether you want to build a playground, make play a priority in your community, or learn more about Imagination Playground , we've got webinars for you in March!</p>
<p>As always, all our webinars are FREE. All you need to participate is a phone and a computer with an Internet connection.</p>"
new_body = Nokogiri::HTML(body)
new_body.encoding = 'UTF-8'
new_body.css('p','a').each{ |p|
p.content = p.content.gsub(/Imagination Playground\s/,'Imagination Playground<i class="fa fa-trademark"></i>');
puts new_body
这就是我得到的:
<p>Whether you want to build a playground, make play a priority in your community, or learn more about Imagination Playground<i class="fa fa-trademark"></i>, we've got webinars for you in March!</p>
<p>As always, all our webinars are FREE. All you need to participate is a phone and a computer with an Internet connection.</p>
如何替换 HTML 段落并转义符号和特殊字符?
这是我的做法:
require 'nokogiri'
doc = Nokogiri::HTML(<<EOT)
<p>Whether you want to build a playground, make play a priority in your community, or learn more about Imagination Playground , we've got webinars for you in March!</p>
<p>As always, all our webinars are FREE. All you need to participate is a phone and a computer with an Internet connection.</p>
EOT
doc.encoding = 'UTF-8'
doc.css('p').each do |p|
p.children = p.content.gsub(/Imagination Playground\s/, 'Imagination Playground<i class="fa fa-trademark"></i>')
end
puts doc
这导致:
# >> <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
# >> <html><body>
# >> <p>Whether you want to build a playground, make play a priority in your community, or learn more about Imagination Playground<i class="fa fa-trademark"></i>, we've got webinars for you in March!</p>
# >> <p>As always, all our webinars are FREE. All you need to participate is a phone and a computer with an Internet connection.</p>
# >> </body></html>
Nokogiri 很聪明。当它看到 children=
时,它会查看是否正在接收字符串。如果是这样,它会解析该字符串并将其转换为一个节点,然后用新节点替换现有的子节点。这与使用 content=
有很大的不同,Nokogiri 知道它应该是文本,然后将嵌入的标签编码为 <
,等等。这在文档中有介绍。
对于children=
:
Set the inner html for this Node node_or_tags node_or_tags can be a Nokogiri::XML::Node, a Nokogiri::XML::DocumentFragment, or a string containing markup.
对于content=
:
Set the Node's content to a Text node containing string. The string gets XML escaped, not interpreted as markup.
this would not work if i want to conserve the html tags that are inside the paragraph, try to do that for
<p>fsome test and then <b>bold</b></p>
您正在更改要求。不要那样做。请具体说明您的需求,以便我们一次性回答真正的问题。
需要一个小改动来获取所需标签的内容。使用 children.to_html
获取嵌入节点的 HTML 字符串,然后 gsub
它并使用其结果:
require 'nokogiri'
doc = Nokogiri::HTML('<p>Imagination Playground<b>foo</b></p>')
puts doc.to_html
开始时看起来像这样:
# >> <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
# >> <html><body><p>Imagination Playground<b>foo</b></p></body></html>
修改DOM:
doc.search('p').each do |p|
p.children = p.children.to_html.gsub(/Imagination Playground\s?/, 'Imagination Playground<i class="fa fa-trademark"></i>')
end
puts doc
现在看起来像:
# >> <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
# >> <html><body><p>Imagination Playground<i class="fa fa-trademark"></i><b>foo</b></p></body></html>
注意我使用的是 search
而不是 css
。使用通用方法而不是更具体的方法。如果需要,它可以更轻松地切换到 XPaths。
此外,我在 gsub
中使用了一种更智能的模式,以有条件地获取单个尾随空格(如果可用)。使用 HTML 执行此操作不是必需的,因为浏览器会吞噬空白,但如果您处理的是常规文本文档或预格式化文本,这将是正确的方法。
而且,关于 Nokogiri 看到的更多细节:
doc.search('p').first
# => #(Element:0x3fd222462204 {
# name = "p",
# children = [
# #(Text "Imagination Playground"),
# #(Element:0x3fd2224608f0 { name = "b", children = [ #(Text "foo")] })]
# })
doc.search('p').first.children
# => [#<Nokogiri::XML::Text:0x3fd222461688 "Imagination Playground">, #<Nokogiri::XML::Element:0x3fd2224608f0 name="b" children=[#<Nokogiri::XML::Text:0x3fd22245fe64 "foo">]>]