使用 minidom 读写 XML 中的特殊字符

Question

我正在尝试在称为对象的元素中写入和读回一组字符串，该元素具有两个属性 name（简单字符串）和 body 正文是一个带有特殊字符的字符串“ \n" 和 "\" 我正在使用以下代码编写 xml 文件：

from xml.dom.minidom import Document

doc = Document()
root = doc.createElement('data')
doc.appendChild(root)
#create a scene
scene = doc.createElement('scene')
root.appendChild(scene)
#add object element
object = doc.createElement('object')
object.setAttribute('name', 'obj1')

txt= 'Text\nsome text\nanother one\and so on\n'
object.setAttribute('body',txt )
scene.appendChild(object)

#write to a file
file_handle = open("filename.xml","wb")
file_handle.write(bytes(doc.toprettyxml(indent='\t'), 'UTF-8'))
file_handle.close()

并生成此文件

<?xml version="1.0" ?>
<data>
    <scene>
        <object body="Text
some text
another one\and so on
" name="obj1"/>
    </scene>
</data>

并用于解析：

filepath = 'file.xml'
dom = minidom.parse(filepath)
scenes =dom.getElementsByTagName('scene')
for scene in scenes:
    txt_objs =scene.getElementsByTagName('object')
    for obj in txt_objs:
        obj_name = obj.getAttribute('name')
        obj_body = obj.getAttribute('body')
        print(obj_name,"  ",obj_body)

解析器的输出与存储的换行特殊字符不一样丢失了，如何保持与输入相同的输出

#parser output
obj1    Text some text another one\and so on

存储和检索包含特殊字符的字符串的正确方法是什么？

Answer 1

minidom 展示的行为与 W3C recommendation. See the following discussion: "Are line breaks in XML attribute values valid?”一致。我在这里引用了@JanCetkovsky 的回答以便于参考：

It is valid, however according to W3C recommendation your XML parser should normalize the all whitespace characters to space (0x20) - so the output of your examples will differ (you should have new line on the output for " ", but only space in the first case). [Source]

如果您可以控制 XML 文档结构（看起来您在构建 XML 时可以自己控制），请将文本作为 XML 元素值而不是 XML 属性值：

.....
#add object element
obj = doc.createElement('object')
obj.setAttribute('name', 'obj1')

txt = 'Text\nsome text\nanother one\and so on\n'
txt_node = doc.createTextNode(txt)
obj.appendChild(txt_node)
scene.appendChild(obj)
.....

使用 minidom 读写 XML 中的特殊字符

Read and Write special characters in XML with minidom

python

xml

special-characters

minidom