使用 minidom 读写 XML 中的特殊字符
Read and Write special characters in XML with minidom
我正在尝试在称为对象的元素中写入和读回一组字符串,该元素具有两个属性 name
(简单字符串)和 body
正文是一个带有特殊字符的字符串“ \n" 和 "\" 我正在使用以下代码编写 xml 文件:
from xml.dom.minidom import Document
doc = Document()
root = doc.createElement('data')
doc.appendChild(root)
#create a scene
scene = doc.createElement('scene')
root.appendChild(scene)
#add object element
object = doc.createElement('object')
object.setAttribute('name', 'obj1')
txt= 'Text\nsome text\nanother one\and so on\n'
object.setAttribute('body',txt )
scene.appendChild(object)
#write to a file
file_handle = open("filename.xml","wb")
file_handle.write(bytes(doc.toprettyxml(indent='\t'), 'UTF-8'))
file_handle.close()
并生成此文件
<?xml version="1.0" ?>
<data>
<scene>
<object body="Text
some text
another one\and so on
" name="obj1"/>
</scene>
</data>
并用于解析:
filepath = 'file.xml'
dom = minidom.parse(filepath)
scenes =dom.getElementsByTagName('scene')
for scene in scenes:
txt_objs =scene.getElementsByTagName('object')
for obj in txt_objs:
obj_name = obj.getAttribute('name')
obj_body = obj.getAttribute('body')
print(obj_name," ",obj_body)
解析器的输出与存储的换行特殊字符不一样丢失了,如何保持与输入相同的输出
#parser output
obj1 Text some text another one\and so on
存储和检索包含特殊字符的字符串的正确方法是什么?
minidom 展示的行为与 W3C recommendation. See the following discussion: "Are line breaks in XML attribute values valid?”一致。我在这里引用了@JanCetkovsky 的回答以便于参考:
It is valid, however according to W3C recommendation your XML parser should normalize the all whitespace characters to space (0x20) - so the output of your examples will differ (you should have new line on the output for "
", but only space in the first case). [Source]
如果您可以控制 XML 文档结构(看起来您在构建 XML 时可以自己控制),请将文本作为 XML 元素值而不是 XML 属性值:
.....
#add object element
obj = doc.createElement('object')
obj.setAttribute('name', 'obj1')
txt = 'Text\nsome text\nanother one\and so on\n'
txt_node = doc.createTextNode(txt)
obj.appendChild(txt_node)
scene.appendChild(obj)
.....
我正在尝试在称为对象的元素中写入和读回一组字符串,该元素具有两个属性 name
(简单字符串)和 body
正文是一个带有特殊字符的字符串“ \n" 和 "\" 我正在使用以下代码编写 xml 文件:
from xml.dom.minidom import Document
doc = Document()
root = doc.createElement('data')
doc.appendChild(root)
#create a scene
scene = doc.createElement('scene')
root.appendChild(scene)
#add object element
object = doc.createElement('object')
object.setAttribute('name', 'obj1')
txt= 'Text\nsome text\nanother one\and so on\n'
object.setAttribute('body',txt )
scene.appendChild(object)
#write to a file
file_handle = open("filename.xml","wb")
file_handle.write(bytes(doc.toprettyxml(indent='\t'), 'UTF-8'))
file_handle.close()
并生成此文件
<?xml version="1.0" ?>
<data>
<scene>
<object body="Text
some text
another one\and so on
" name="obj1"/>
</scene>
</data>
并用于解析:
filepath = 'file.xml'
dom = minidom.parse(filepath)
scenes =dom.getElementsByTagName('scene')
for scene in scenes:
txt_objs =scene.getElementsByTagName('object')
for obj in txt_objs:
obj_name = obj.getAttribute('name')
obj_body = obj.getAttribute('body')
print(obj_name," ",obj_body)
解析器的输出与存储的换行特殊字符不一样丢失了,如何保持与输入相同的输出
#parser output
obj1 Text some text another one\and so on
存储和检索包含特殊字符的字符串的正确方法是什么?
minidom 展示的行为与 W3C recommendation. See the following discussion: "Are line breaks in XML attribute values valid?”一致。我在这里引用了@JanCetkovsky 的回答以便于参考:
It is valid, however according to W3C recommendation your XML parser should normalize the all whitespace characters to space (0x20) - so the output of your examples will differ (you should have new line on the output for " ", but only space in the first case). [Source]
如果您可以控制 XML 文档结构(看起来您在构建 XML 时可以自己控制),请将文本作为 XML 元素值而不是 XML 属性值:
.....
#add object element
obj = doc.createElement('object')
obj.setAttribute('name', 'obj1')
txt = 'Text\nsome text\nanother one\and so on\n'
txt_node = doc.createTextNode(txt)
obj.appendChild(txt_node)
scene.appendChild(obj)
.....