在不使用 python 中的 tostring 的情况下提取 xml 的一部分

Question

假设我有这样一个 XML 代码：

<a>
 <na:Data xmlns:na="http://some_site.com#" Ref="http://another_site.com" 
  Key="value">
  <b>
  <c>some_c_attrib</c>
  <d>some_d_attrib</d>
  <e>some_e_attrib</e>
   <f>some_f_attrib</f>
   <g>some_g_attrib</g>
  </b>
  <h>
   <i>some_i_attrib</i>
   <j>some_j_attrib</j>
  </h>
 </na:Data>
 <da:Newtag xmlns:da="http://new_site.com">
  <k name="http://new_new_site.com"/>

此后还有几行。我已经使用 ET.parse(FILENAME) 解析了 xml，然后使用 write_c14n("new.xml") 将其写入新文件。我现在想将这个 new.xml 的一部分提取到另一个 xml 文件中，我只想要从 <na:Data xmlns:na="http://some_site.com#" Ref="http://another_site.com" Key="value"> 开始到 </h>.

结束的部分

但是，我不想使用 tostring()，因为它不保留使用 write_c14n() 获得的 xml 的规范化。我想知道是否只从 new.xml 复制那部分并将其写入另一个 xml 会有所帮助，但我猜它会在两者之间添加一些额外的新行并且也不会保留 [=47= 的格式】原样。

我试过以下方法：

通过这种方式，我尝试创建另一个 xml，新根为 <na:Data xmlns:na="http://some_site.com#" Ref="http://another_site.com" Key="value">:

from lxml import etree
from io import StringIO, BytesIO
import xml.etree.ElementTree as et
import xml.etree.ElementTree as xml
from xml.etree import ElementTree as ET

tree = etree.parse('file_location/file_to_read.xml')
root = tree.getroot()

sub_root = etree.Element('{http://some_site.com#}Data')
for node in root.find('.//na:Data', namespaces = {'na':'http://some_site.com#'}).getchildren():


    sub_root.append(node.element)

new_tree = etree.ElementTree(sub_root)

我只需要 new_tree 的对象，这样我就可以将它用作 new_tree。但是，如果我使用 tostring() 打印上面的 new_tree [即打印 etree.tostring(root_tree,pretty_print=True)]，这就是我得到的输出：

<ns0:Data xmlns:ns0="http://some_site.com#"><b>
 <c>some_c_attrib</c>
 <d>some_d_attrib</d>
 <e>some_e_attrib</e>
  <f>some_f_attrib</f>
  <g>some_g_attrib</g>
 </b>
 <h>
  <i>some_i_attrib</i>
  <j>some_j_attrib</j>
 </h>
</ns0:Data>

如您所见，na:Data 已被 ns0:Data 替换，并且它的键和值 (Ref="http://another_site.com" Key="value") 也丢失了。我需要一种方法来提取 xml 的一部分，因为它包含所有属性、键和值。

Answer 1

无需创建新元素。只需解析原始 XML 文件，提取 na:Data 子元素，并将其写入新文件。

from lxml import etree

tree = etree.parse('file_location/file_to_read.xml')
Data = tree.find('.//na:Data', namespaces={'na':'http://some_site.com#'})
etree.ElementTree(Data).write_c14n("new.xml")

在不使用 python 中的 tostring 的情况下提取 xml 的一部分

Extract portion of an xml without using tostring in python

lxml

elementtree

python-2.7

c14n