ElementTree:为什么我的名称空间声明被删除了?
ElementTree: why are my namespace declarations stripped out?
我正在构建 openoffice 文档。我有一个用于生成 content.xml 文件的脚手架。 content-scaffold.xml 文件存储在文件系统中,如下所示:
<?xml version="1.0" encoding="UTF-8"?>
<office:document-content
xmlns:anim="urn:oasis:names:tc:opendocument:xmlns:animation:1.0"
xmlns:chart="urn:oasis:names:tc:opendocument:xmlns:chart:1.0"
xmlns:config="urn:oasis:names:tc:opendocument:xmlns:config:1.0"
xmlns:db="urn:oasis:names:tc:opendocument:xmlns:database:1.0"
xmlns:dc="http://purl.org/dc/elements/1.1/"
xmlns:dr3d="urn:oasis:names:tc:opendocument:xmlns:dr3d:1.0"
xmlns:draw="urn:oasis:names:tc:opendocument:xmlns:drawing:1.0"
xmlns:fo="urn:oasis:names:tc:opendocument:xmlns:xsl-fo-compatible:1.0"
xmlns:form="urn:oasis:names:tc:opendocument:xmlns:form:1.0"
xmlns:grddl="http://www.w3.org/2003/g/data-view#"
xmlns:manifest="urn:oasis:names:tc:opendocument:xmlns:manifest:1.0"
xmlns:math="http://www.w3.org/1998/Math/MathML"
xmlns:meta="urn:oasis:names:tc:opendocument:xmlns:meta:1.0"
xmlns:number="urn:oasis:names:tc:opendocument:xmlns:datastyle:1.0"
xmlns:odf="http://docs.oasis-open.org/ns/office/1.2/meta/odf#"
xmlns:of="urn:oasis:names:tc:opendocument:xmlns:of:1.2"
xmlns:office="urn:oasis:names:tc:opendocument:xmlns:office:1.0"
xmlns:pkg="http://docs.oasis-open.org/ns/office/1.2/meta/pkg#"
xmlns:presentation="urn:oasis:names:tc:opendocument:xmlns:presentation:1.0"
xmlns:script="urn:oasis:names:tc:opendocument:xmlns:script:1.0"
xmlns:smil="urn:oasis:names:tc:opendocument:xmlns:smil-compatible:1.0"
xmlns:style="urn:oasis:names:tc:opendocument:xmlns:style:1.0"
xmlns:svg="urn:oasis:names:tc:opendocument:xmlns:svg-compatible:1.0"
xmlns:table="urn:oasis:names:tc:opendocument:xmlns:table:1.0"
xmlns:text="urn:oasis:names:tc:opendocument:xmlns:text:1.0"
xmlns:xforms="http://www.w3.org/2002/xforms"
xmlns:xhtml="http://www.w3.org/1999/xhtml"
xmlns:xlink="http://www.w3.org/1999/xlink"
office:version="1.2">
<office:automatic-styles>
<style:style style:family="text" style:name="Strong">
<style:text-properties
fo:color="#000000"
fo:font-weight="bold" />
</style:style>
</office:automatic-styles>
<office:body>
<office:text>
<!-- content will go here -->
</office:text>
</office:body>
</office:document-content>
我的想法是,我使用这个 xml 并将内容注入 office:text 标记(在 python 中),然后将其渲染回来。在这个例子中,我注入了一个简单的 text:p 标签。
document_content = ElementTree.parse('content-scaffold.xml').getroot()
office_body = document_content.find('office:body', NAMESPACES)
office_text = office_body.find('office:text', NAMESPACES)
p = ElementTree.SubElement(office_text, 'text:p')
p.text = "Hello"
然而,这是命名空间声明呈现后的样子:
<office:document-content
xmlns:fo="urn:oasis:names:tc:opendocument:xmlns:xsl-fo-compatible:1.0"
xmlns:office="urn:oasis:names:tc:opendocument:xmlns:office:1.0"
xmlns:style="urn:oasis:names:tc:opendocument:xmlns:style:1.0"
office:version="1.2">
这会导致以下错误:
Namespace prefix text on p is not defined
很明显,ElementTree 只保留需要的 xmlns 声明(在我的例子中是 fo、office 和 style,因为它们是唯一出现在 content-scaffold.xml 中的声明),并且这很整洁。但是,我真的想要它们,以便能够使用所有命名空间。
知道如何强制 ElementTree 保留它们吗?还是我从一开始就认为这是错误的?我愿意接受任何替代解决方案。
注意:我使用的是 Python 3 和 ElementTree
谢谢
ElementTree 在命名空间处理方面相当薄弱。但是,你要求的是可以做到的(但是有点麻烦):
from xml.etree import ElementTree as ET
NAMESPACES = {"anim": "urn:oasis:names:tc:opendocument:xmlns:animation:1.0",
"chart": "urn:oasis:names:tc:opendocument:xmlns:chart:1.0",
"config": "urn:oasis:names:tc:opendocument:xmlns:config:1.0",
"db": "urn:oasis:names:tc:opendocument:xmlns:database:1.0",
"dc": "http://purl.org/dc/elements/1.1/",
"dr3d": "urn:oasis:names:tc:opendocument:xmlns:dr3d:1.0",
"draw": "urn:oasis:names:tc:opendocument:xmlns:drawing:1.0",
"fo": "urn:oasis:names:tc:opendocument:xmlns:xsl-fo-compatible:1.0",
"form": "urn:oasis:names:tc:opendocument:xmlns:form:1.0",
"grddl": "http://www.w3.org/2003/g/data-view#",
"manifest": "urn:oasis:names:tc:opendocument:xmlns:manifest:1.0",
"math": "http://www.w3.org/1998/Math/MathML",
"meta": "urn:oasis:names:tc:opendocument:xmlns:meta:1.0",
"number": "urn:oasis:names:tc:opendocument:xmlns:datastyle:1.0",
"odf": "http://docs.oasis-open.org/ns/office/1.2/meta/odf#",
"of": "urn:oasis:names:tc:opendocument:xmlns:of:1.2",
"office": "urn:oasis:names:tc:opendocument:xmlns:office:1.0",
"pkg": "http://docs.oasis-open.org/ns/office/1.2/meta/pkg#",
"presentation": "urn:oasis:names:tc:opendocument:xmlns:presentation:1.0",
"script": "urn:oasis:names:tc:opendocument:xmlns:script:1.0",
"smil": "urn:oasis:names:tc:opendocument:xmlns:smil-compatible:1.0",
"style": "urn:oasis:names:tc:opendocument:xmlns:style:1.0",
"svg": "urn:oasis:names:tc:opendocument:xmlns:svg-compatible:1.0",
"table": "urn:oasis:names:tc:opendocument:xmlns:table:1.0",
"text": "urn:oasis:names:tc:opendocument:xmlns:text:1.0",
"xforms": "http://www.w3.org/2002/xforms",
"xhtml": "http://www.w3.org/1999/xhtml",
"xlink": "http://www.w3.org/1999/xlink"}
document_content = ET.parse('content-scaffold.xml').getroot()
office_body = document_content.find('office:body', NAMESPACES)
office_text = office_body.find('office:text', NAMESPACES)
p = ET.SubElement(office_text, 'text:p')
p.text = "Hello"
for prefix, uri in NAMESPACES.items():
ET.register_namespace(prefix, uri) # Ensure correct prefixes in output
if prefix not in ("office", "fo", "style"): # Prevent duplicate ns declarations
document_content.set("xmlns:" + prefix, uri) # Add ns declarations to root element
ET.ElementTree(document_content).write("output.xml")
此代码将创建一个保留所有命名空间声明的结果文档。
下面是 lxml 的实现方式:
from lxml import etree as ET
NAMESPACES = {"office": "urn:oasis:names:tc:opendocument:xmlns:office:1.0"}
document_content = ET.parse('content-scaffold.xml')
office_body = document_content.find('office:body', NAMESPACES)
office_text = office_body.find('office:text', NAMESPACES)
p = ET.SubElement(office_text, '{urn:oasis:names:tc:opendocument:xmlns:text:1.0}p')
p.text = "Hello"
document_content.write("output.xml")
请注意,您必须在 SubElement()
中使用 Clark notation 提供元素名称。
我正在构建 openoffice 文档。我有一个用于生成 content.xml 文件的脚手架。 content-scaffold.xml 文件存储在文件系统中,如下所示:
<?xml version="1.0" encoding="UTF-8"?>
<office:document-content
xmlns:anim="urn:oasis:names:tc:opendocument:xmlns:animation:1.0"
xmlns:chart="urn:oasis:names:tc:opendocument:xmlns:chart:1.0"
xmlns:config="urn:oasis:names:tc:opendocument:xmlns:config:1.0"
xmlns:db="urn:oasis:names:tc:opendocument:xmlns:database:1.0"
xmlns:dc="http://purl.org/dc/elements/1.1/"
xmlns:dr3d="urn:oasis:names:tc:opendocument:xmlns:dr3d:1.0"
xmlns:draw="urn:oasis:names:tc:opendocument:xmlns:drawing:1.0"
xmlns:fo="urn:oasis:names:tc:opendocument:xmlns:xsl-fo-compatible:1.0"
xmlns:form="urn:oasis:names:tc:opendocument:xmlns:form:1.0"
xmlns:grddl="http://www.w3.org/2003/g/data-view#"
xmlns:manifest="urn:oasis:names:tc:opendocument:xmlns:manifest:1.0"
xmlns:math="http://www.w3.org/1998/Math/MathML"
xmlns:meta="urn:oasis:names:tc:opendocument:xmlns:meta:1.0"
xmlns:number="urn:oasis:names:tc:opendocument:xmlns:datastyle:1.0"
xmlns:odf="http://docs.oasis-open.org/ns/office/1.2/meta/odf#"
xmlns:of="urn:oasis:names:tc:opendocument:xmlns:of:1.2"
xmlns:office="urn:oasis:names:tc:opendocument:xmlns:office:1.0"
xmlns:pkg="http://docs.oasis-open.org/ns/office/1.2/meta/pkg#"
xmlns:presentation="urn:oasis:names:tc:opendocument:xmlns:presentation:1.0"
xmlns:script="urn:oasis:names:tc:opendocument:xmlns:script:1.0"
xmlns:smil="urn:oasis:names:tc:opendocument:xmlns:smil-compatible:1.0"
xmlns:style="urn:oasis:names:tc:opendocument:xmlns:style:1.0"
xmlns:svg="urn:oasis:names:tc:opendocument:xmlns:svg-compatible:1.0"
xmlns:table="urn:oasis:names:tc:opendocument:xmlns:table:1.0"
xmlns:text="urn:oasis:names:tc:opendocument:xmlns:text:1.0"
xmlns:xforms="http://www.w3.org/2002/xforms"
xmlns:xhtml="http://www.w3.org/1999/xhtml"
xmlns:xlink="http://www.w3.org/1999/xlink"
office:version="1.2">
<office:automatic-styles>
<style:style style:family="text" style:name="Strong">
<style:text-properties
fo:color="#000000"
fo:font-weight="bold" />
</style:style>
</office:automatic-styles>
<office:body>
<office:text>
<!-- content will go here -->
</office:text>
</office:body>
</office:document-content>
我的想法是,我使用这个 xml 并将内容注入 office:text 标记(在 python 中),然后将其渲染回来。在这个例子中,我注入了一个简单的 text:p 标签。
document_content = ElementTree.parse('content-scaffold.xml').getroot()
office_body = document_content.find('office:body', NAMESPACES)
office_text = office_body.find('office:text', NAMESPACES)
p = ElementTree.SubElement(office_text, 'text:p')
p.text = "Hello"
然而,这是命名空间声明呈现后的样子:
<office:document-content
xmlns:fo="urn:oasis:names:tc:opendocument:xmlns:xsl-fo-compatible:1.0"
xmlns:office="urn:oasis:names:tc:opendocument:xmlns:office:1.0"
xmlns:style="urn:oasis:names:tc:opendocument:xmlns:style:1.0"
office:version="1.2">
这会导致以下错误:
Namespace prefix text on p is not defined
很明显,ElementTree 只保留需要的 xmlns 声明(在我的例子中是 fo、office 和 style,因为它们是唯一出现在 content-scaffold.xml 中的声明),并且这很整洁。但是,我真的想要它们,以便能够使用所有命名空间。
知道如何强制 ElementTree 保留它们吗?还是我从一开始就认为这是错误的?我愿意接受任何替代解决方案。
注意:我使用的是 Python 3 和 ElementTree
谢谢
ElementTree 在命名空间处理方面相当薄弱。但是,你要求的是可以做到的(但是有点麻烦):
from xml.etree import ElementTree as ET
NAMESPACES = {"anim": "urn:oasis:names:tc:opendocument:xmlns:animation:1.0",
"chart": "urn:oasis:names:tc:opendocument:xmlns:chart:1.0",
"config": "urn:oasis:names:tc:opendocument:xmlns:config:1.0",
"db": "urn:oasis:names:tc:opendocument:xmlns:database:1.0",
"dc": "http://purl.org/dc/elements/1.1/",
"dr3d": "urn:oasis:names:tc:opendocument:xmlns:dr3d:1.0",
"draw": "urn:oasis:names:tc:opendocument:xmlns:drawing:1.0",
"fo": "urn:oasis:names:tc:opendocument:xmlns:xsl-fo-compatible:1.0",
"form": "urn:oasis:names:tc:opendocument:xmlns:form:1.0",
"grddl": "http://www.w3.org/2003/g/data-view#",
"manifest": "urn:oasis:names:tc:opendocument:xmlns:manifest:1.0",
"math": "http://www.w3.org/1998/Math/MathML",
"meta": "urn:oasis:names:tc:opendocument:xmlns:meta:1.0",
"number": "urn:oasis:names:tc:opendocument:xmlns:datastyle:1.0",
"odf": "http://docs.oasis-open.org/ns/office/1.2/meta/odf#",
"of": "urn:oasis:names:tc:opendocument:xmlns:of:1.2",
"office": "urn:oasis:names:tc:opendocument:xmlns:office:1.0",
"pkg": "http://docs.oasis-open.org/ns/office/1.2/meta/pkg#",
"presentation": "urn:oasis:names:tc:opendocument:xmlns:presentation:1.0",
"script": "urn:oasis:names:tc:opendocument:xmlns:script:1.0",
"smil": "urn:oasis:names:tc:opendocument:xmlns:smil-compatible:1.0",
"style": "urn:oasis:names:tc:opendocument:xmlns:style:1.0",
"svg": "urn:oasis:names:tc:opendocument:xmlns:svg-compatible:1.0",
"table": "urn:oasis:names:tc:opendocument:xmlns:table:1.0",
"text": "urn:oasis:names:tc:opendocument:xmlns:text:1.0",
"xforms": "http://www.w3.org/2002/xforms",
"xhtml": "http://www.w3.org/1999/xhtml",
"xlink": "http://www.w3.org/1999/xlink"}
document_content = ET.parse('content-scaffold.xml').getroot()
office_body = document_content.find('office:body', NAMESPACES)
office_text = office_body.find('office:text', NAMESPACES)
p = ET.SubElement(office_text, 'text:p')
p.text = "Hello"
for prefix, uri in NAMESPACES.items():
ET.register_namespace(prefix, uri) # Ensure correct prefixes in output
if prefix not in ("office", "fo", "style"): # Prevent duplicate ns declarations
document_content.set("xmlns:" + prefix, uri) # Add ns declarations to root element
ET.ElementTree(document_content).write("output.xml")
此代码将创建一个保留所有命名空间声明的结果文档。
下面是 lxml 的实现方式:
from lxml import etree as ET
NAMESPACES = {"office": "urn:oasis:names:tc:opendocument:xmlns:office:1.0"}
document_content = ET.parse('content-scaffold.xml')
office_body = document_content.find('office:body', NAMESPACES)
office_text = office_body.find('office:text', NAMESPACES)
p = ET.SubElement(office_text, '{urn:oasis:names:tc:opendocument:xmlns:text:1.0}p')
p.text = "Hello"
document_content.write("output.xml")
请注意,您必须在 SubElement()
中使用 Clark notation 提供元素名称。