etree 解析 xml 并在内部转义 html

Question

我有一个 xml 文件，里面有转义的 html，字段看起来像这样：

<title>Some records title with html This should be inside escaped html , end of the title</title>

我发现那个元素很好：

el = titles.find("x:title", NS)

但当我这样做时：

el.text

它 returns 带有非转义标签的文本：

'Some records title with html This should be inside escaped html ;, end of the title'

为什么会这样？即使提供了转义，我是否必须再次单独转义 html 标签？我希望能够为 xml 提供转义和非转义的 html 标签（有时将其显示为文本，有时显示为格式化文本）。如何正确提供？

Answer 1

使用ElementTree函数时_escape_attrib()可以使用：

import xml.etree.ElementTree as ET

text = '''<title>Some records title with html &lt;i&gt; This should be inside escaped html &lt;/i&gt;, end of the title</title>
'''

root = ET.fromstring(text)

print(ET._escape_attrib(root.text))

这将输出 Some records title with html This should be inside escaped html , end of the title.

etree 解析 xml 并在内部转义 html

etree parsing xml with escaped html inside

html

xml

elementtree