替换 HTML 字符串中的视频标签
Replace video tags from HTML string
HTML 字符串是:
"<div>\r\n<video controls=\"controls\" height=\"313\" id=\"video201643154436\" poster=\"/uploads/ckeditor/pictures/18/content_56883622_18f242e114.jpg\" width=\"500\"><source src=\"/uploads/ckeditor/attachments/23/newtons_law.mp4\" type=\"video/mp4\" />Your browser doesn't support video.<br />\r\nPlease download the file: <a href=\"/uploads/ckeditor/attachments/23/newtons_law.mp4\">video/mp4</a></video>\r\n</div>\r\n\r\n<div>test description</div>\r\n\r\n<div>\r\n<div>\r\n<video controls=\"controls\" height=\"300\" id=\"video201644152011\" poster=\"\" width=\"400\"><source src=\"/uploads/ckeditor/attachments/24/test.mp4\" type=\"video/mp4\" />Your browser doesn't support video.<br />\r\nPlease download the file: <a href=\"/uploads/ckeditor/attachments/24/test.mp4\">video/mp4</a></video>\r\n</div>\r\n\r\n<p> </p>\r\n</div>\r\n"
我想用 [[ Video ]]
替换所有视频标签,包括其内容和子标签
预期输出为:
"<div>\r\n[[ Video ]]\r\n</div>\r\n\r\n<div>test description</div>\r\n\r\n<div>\r\n<div>\r\n[[ Video ]]\r\n</div>\r\n\r\n<p> </p>\r\n</div>\r\n"
我试过使用正则表达式 /<video\s(.*?)<\/video(?=[>])>/
,但它无法正常工作。
使用正则表达式解析 html 是一项非常艰巨的任务。我建议使用 nokogiri
或类似的 gem 将其解析为 ast 并替换您需要的节点。
我认为你需要替换这两个确切的字符串,以及这个标签中的内容
首先是开始和结束字符串:
"<video "
"</video>"
puts html_text.gsub("<video ","[[ video ]] ").gsub('</video>',"[[ video ]]")
这应该有效
irb(main):020:0> <div>
[[ video ]] controls="controls" height="313" id="video201643154436" poster="/uploads/ckeditor/pictures/18/content_56883622_18f242e114.jpg" width="500"><source src="/uploads/ckeditor/attachments/23/newtons_law.mp4" type="video/mp4" />Your browser doesn't support video.<br />
Please download the file: <a href="/uploads/ckeditor/attachments/23/newtons_law.mp4">video/mp4</a>[[ video ]]
</div>
<div>test description</div>
<div>
<div>
[[ video ]] controls="controls" height="300" id="video201644152011" poster="" width="400"><source src="/uploads/ckeditor/attachments/24/test.mp4" type="video/mp4" />Your browser doesn't support video.<br />
Please download the file: <a href="/uploads/ckeditor/attachments/24/test.mp4">video/mp4</a>[[ video ]]
</div>
<p> </p>
</div>
=> true
或使用正则表达式
puts html_text.gsub(/<\/?video[\s>]/, "[[ video ]]")
<div>
[[ video ]]controls="controls" height="313" id="video201643154436" poster="/uploads/ckeditor/pictures/18/content_56883622_18f242e114.jpg" width="500"><source src="/uploads/ckeditor/attachments/23/newtons_law.mp4" type="video/mp4" />Your browser doesn't support video.<br />
Please download the file: <a href="/uploads/ckeditor/attachments/23/newtons_law.mp4">video/mp4</a>[[ video ]]
</div>
<div>test description</div>
<div>
<div>
[[ video ]]controls="controls" height="300" id="video201644152011" poster="" width="400"><source src="/uploads/ckeditor/attachments/24/test.mp4" type="video/mp4" />Your browser doesn't support video.<br />
Please download the file: <a href="/uploads/ckeditor/attachments/24/test.mp4">video/mp4</a>[[ video ]]
</div>
<p> </p>
</div>
最后把这个标签里面的全部去掉,所有的内容全部替换掉。问题是 \n 字符使用这个修饰符:
/.*/m multiline: . matches newline
/.*/i ignore case
/.*/x extended: ignore whitespace in pattern
所以最后,如果我们将所有内容连接在一起,则正则表达式为:
puts html_text.gsub(/<video\s.*?<\/video>/mix, "[[ video ]]")
结果
irb(main):043:0> <div>
[[ video ]]
</div>
<div>test description</div>
<div>
<div>
[[ video ]]
</div>
<p> </p>
</div>
=> true
anquegi 的解决方案非常有效。与此同时,我尝试了 nokogiri:
str = "<div>\r\n<video controls=\"controls\" height=\"313\" id=\"video201643154436\" poster=\"/uploads/ckeditor/pictures/18/content_56883622_18f242e114.jpg\" width=\"500\"><source src=\"/uploads/ckeditor/attachments/23/newtons_law.mp4\" type=\"video/mp4\" />Your browser doesn't support video.<br />\r\nPlease download the file: <a href=\"/uploads/ckeditor/attachments/23/newtons_law.mp4\">video/mp4</a></video>\r\n</div>\r\n\r\n<div>test description</div>\r\n\r\n<div>\r\n<div>\r\n<video controls=\"controls\" height=\"300\" id=\"video201644152011\" poster=\"\" width=\"400\"><source src=\"/uploads/ckeditor/attachments/24/test.mp4\" type=\"video/mp4\" />Your browser doesn't support video.<br />\r\nPlease download the file: <a href=\"/uploads/ckeditor/attachments/24/test.mp4\">video/mp4</a></video>\r\n</div>\r\n\r\n<p> </p>\r\n</div>\r\n"
doc = Nokogiri::HTML(str)
doc.css("video").each do |video|
new_node = doc.create_element "p"
new_node.inner_html = "[[ Video ]]"
video.replace new_node
end
new_str = doc.css("body").to_s
HTML 字符串是:
"<div>\r\n<video controls=\"controls\" height=\"313\" id=\"video201643154436\" poster=\"/uploads/ckeditor/pictures/18/content_56883622_18f242e114.jpg\" width=\"500\"><source src=\"/uploads/ckeditor/attachments/23/newtons_law.mp4\" type=\"video/mp4\" />Your browser doesn't support video.<br />\r\nPlease download the file: <a href=\"/uploads/ckeditor/attachments/23/newtons_law.mp4\">video/mp4</a></video>\r\n</div>\r\n\r\n<div>test description</div>\r\n\r\n<div>\r\n<div>\r\n<video controls=\"controls\" height=\"300\" id=\"video201644152011\" poster=\"\" width=\"400\"><source src=\"/uploads/ckeditor/attachments/24/test.mp4\" type=\"video/mp4\" />Your browser doesn't support video.<br />\r\nPlease download the file: <a href=\"/uploads/ckeditor/attachments/24/test.mp4\">video/mp4</a></video>\r\n</div>\r\n\r\n<p> </p>\r\n</div>\r\n"
我想用 [[ Video ]]
预期输出为:
"<div>\r\n[[ Video ]]\r\n</div>\r\n\r\n<div>test description</div>\r\n\r\n<div>\r\n<div>\r\n[[ Video ]]\r\n</div>\r\n\r\n<p> </p>\r\n</div>\r\n"
我试过使用正则表达式 /<video\s(.*?)<\/video(?=[>])>/
,但它无法正常工作。
使用正则表达式解析 html 是一项非常艰巨的任务。我建议使用 nokogiri
或类似的 gem 将其解析为 ast 并替换您需要的节点。
我认为你需要替换这两个确切的字符串,以及这个标签中的内容
首先是开始和结束字符串:
"<video "
"</video>"
puts html_text.gsub("<video ","[[ video ]] ").gsub('</video>',"[[ video ]]")
这应该有效
irb(main):020:0> <div>
[[ video ]] controls="controls" height="313" id="video201643154436" poster="/uploads/ckeditor/pictures/18/content_56883622_18f242e114.jpg" width="500"><source src="/uploads/ckeditor/attachments/23/newtons_law.mp4" type="video/mp4" />Your browser doesn't support video.<br />
Please download the file: <a href="/uploads/ckeditor/attachments/23/newtons_law.mp4">video/mp4</a>[[ video ]]
</div>
<div>test description</div>
<div>
<div>
[[ video ]] controls="controls" height="300" id="video201644152011" poster="" width="400"><source src="/uploads/ckeditor/attachments/24/test.mp4" type="video/mp4" />Your browser doesn't support video.<br />
Please download the file: <a href="/uploads/ckeditor/attachments/24/test.mp4">video/mp4</a>[[ video ]]
</div>
<p> </p>
</div>
=> true
或使用正则表达式
puts html_text.gsub(/<\/?video[\s>]/, "[[ video ]]")
<div>
[[ video ]]controls="controls" height="313" id="video201643154436" poster="/uploads/ckeditor/pictures/18/content_56883622_18f242e114.jpg" width="500"><source src="/uploads/ckeditor/attachments/23/newtons_law.mp4" type="video/mp4" />Your browser doesn't support video.<br />
Please download the file: <a href="/uploads/ckeditor/attachments/23/newtons_law.mp4">video/mp4</a>[[ video ]]
</div>
<div>test description</div>
<div>
<div>
[[ video ]]controls="controls" height="300" id="video201644152011" poster="" width="400"><source src="/uploads/ckeditor/attachments/24/test.mp4" type="video/mp4" />Your browser doesn't support video.<br />
Please download the file: <a href="/uploads/ckeditor/attachments/24/test.mp4">video/mp4</a>[[ video ]]
</div>
<p> </p>
</div>
最后把这个标签里面的全部去掉,所有的内容全部替换掉。问题是 \n 字符使用这个修饰符:
/.*/m multiline: . matches newline /.*/i ignore case /.*/x extended: ignore whitespace in pattern
所以最后,如果我们将所有内容连接在一起,则正则表达式为:
puts html_text.gsub(/<video\s.*?<\/video>/mix, "[[ video ]]")
结果
irb(main):043:0> <div>
[[ video ]]
</div>
<div>test description</div>
<div>
<div>
[[ video ]]
</div>
<p> </p>
</div>
=> true
anquegi 的解决方案非常有效。与此同时,我尝试了 nokogiri:
str = "<div>\r\n<video controls=\"controls\" height=\"313\" id=\"video201643154436\" poster=\"/uploads/ckeditor/pictures/18/content_56883622_18f242e114.jpg\" width=\"500\"><source src=\"/uploads/ckeditor/attachments/23/newtons_law.mp4\" type=\"video/mp4\" />Your browser doesn't support video.<br />\r\nPlease download the file: <a href=\"/uploads/ckeditor/attachments/23/newtons_law.mp4\">video/mp4</a></video>\r\n</div>\r\n\r\n<div>test description</div>\r\n\r\n<div>\r\n<div>\r\n<video controls=\"controls\" height=\"300\" id=\"video201644152011\" poster=\"\" width=\"400\"><source src=\"/uploads/ckeditor/attachments/24/test.mp4\" type=\"video/mp4\" />Your browser doesn't support video.<br />\r\nPlease download the file: <a href=\"/uploads/ckeditor/attachments/24/test.mp4\">video/mp4</a></video>\r\n</div>\r\n\r\n<p> </p>\r\n</div>\r\n"
doc = Nokogiri::HTML(str)
doc.css("video").each do |video|
new_node = doc.create_element "p"
new_node.inner_html = "[[ Video ]]"
video.replace new_node
end
new_str = doc.css("body").to_s