使用 Element Tree 提取元素文字文本

Question

我有以下 XML 和 <Description> 标签，其中包含特殊字符的以下文本。

<branch>
   <Description>
      Here are few steps to make these settings
      1)    Tools &lt;&lt; Internet options 2)  Click on General tab
   </Description>
</branch>

现在，当我尝试检索描述文本时，我得到以下结果，其中自动将 < 转换为 >。所以代码及其结果如下

代码-

from xml.etree import ElementTree as ET 
tree = ET.parse(inputFile) # copy the above xml into any file and pass the path to inputFile 

    root = tree.getroot()

    for description in root.iter('Description'):
        print(description.text)

我需要描述文本标签中的字符串文字。我们如何获得它？

预期 -

Here are few steps to make these settings
          1)    Tools >> Internet options 2)    Click on General tab

Answer 1

您可以简单地使用html.escape()重新转义内容：

import html
from xml.etree import ElementTree as ET

tree = ET.parse('test.xml')
root = tree.getroot()

for description in root.iter('Description'):
    print(html.escape(description.text))

结果：

Here are few steps to make these settings
1)    Tools &lt;&lt; Internet options 2)  Click on General tab

使用 Element Tree 提取元素文字文本

Extract elements literal text using Element Tree

python

elementtree