不要使用 Python ElementTree 对 Element 文本对象进行编码

Don't encode Element text object using Python ElementTree

我试图在元素的文本节点内使用 HTML 数据,但它得到 编码就好像它不是 HTML 数据。

这是一个 MWE:

from xml.etree import ElementTree as ET

data = '<a href="https://example.com">Example data gained from elsewhere.</a>'

p = ET.Element('p')
p.text = data
p = ET.tostring(p, encoding='utf-8', method='html').decode('utf8')
print(p)

输出是...

<p>&lt;a href="https://example.com"&gt;Example data gained from elsewhere.&lt;/a&gt;</p>

我的意思是...

<p><a href="https://example.com">Example data gained from elsewhere.</a></p>

您可以将 HTML 字符串解析为 ElementTree 对象并将其附加到 DOM:

from xml.etree import ElementTree as ET

data = '<a href="https://example.com">Example data gained from elsewhere.</a>'

p = ET.Element('p')
p.append(ET.fromstring(data))
p = ET.tostring(p, encoding='utf-8', method='html').decode('utf8')
print(p)

你的做法是错误的。您正在分配 p.text = data,它基本上将节点视为文本内容。很明显文本被转义了。 您必须将其添加为 child。如下所示:

from xml.etree import ElementTree as ET

data = '<a href="https://example.com">Example data gained from elsewhere.</a>'

d = ET.fromstring(data)
p = ET.Element('p')

p.append(d)
p = ET.tostring(p, encoding='utf-8', method='html').decode('utf8')
print(p)

给予输出

<p><a href="https://example.com">Example data gained from elsewhere.</a></p>