用占位符替换 XML 实体 (& < > )

Replacing XML entity (&amp; &lt; &gt; &nbsp;) with a placeholder

我有一个这样的 XML 文件;

<Table>
<Row>
    <Cell>text id</Cell>
    <Cell>First Text&lt;br/&gt;&amp;nbsp;&lt;br/&gt;</Cell>
    <Cell>Second Text&lt;br/&gt;&amp;nbsp;&lt;br/&gt;</Cell>
</Row>
.
.
</Table>

通过使用这些代码,我将 <Cell></Cell> 中的每个文本发送到列表变量。

tree = ET.parse(file_path)
root = tree.getroot()
list = []
for row in root.iter(tag='Row'):
    for cell in row:
        list.append(cell.text)

问题是 First Text&lt;br/&gt;&amp;nbsp;&lt;br/&gt; 变成 First Text<br/>&nbsp;<br/>

我想用我自己分配的占位符替换这些实体字符,因为我必须稍后在我的程序中将它们替换为原始形式。例如;

special character -> [placeholder] &lt;br/&gt; -> [br/] &amp; -> [amp] &nbsp; -> [nbsp]

我找到了方法。在 XML 解析之前,我用 readlines() 读取每一行,然后在新的 xml 文件中写入该行之后用占位符替换实体字符。翻译后,我可以使用 replace() 将它们替换为原始形式。

First Text&lt;br/&gt;&amp;nbsp;&lt;br/&gt; => First Text[H1][H4][H6][H1]

with open(r"C:\Users\USER NAME\PycharmProjects\FILE TEST\edited_file.xml", "w", encoding="utf-8") as file2:
    pass

with open(r"C:\Users\USER NAME\PycharmProjects\FILE TEST\source_file.xml", "r", encoding="utf-8") as file1:
    lines = file1.readlines()
    text = ""
    for i in lines:
        text = i
        text = text.replace('&lt;br/&gt;', '[H1]')
        text = text.replace('&lt;', '[H2]')
        text = text.replace("&gt;", "[H3]")
        text = text.replace("&amp;", "[H4]")
        text = text.replace("&nbsp;", "[H5]")
        text = text.replace("nbsp;", "[H6]")
        with open(r"C:\Users\USER NAME\PycharmProjects\FILE TEST\edited_file.xml", "a", encoding="utf-8") as file2:
            file2.writelines(text)