忠实地保留已解析的注释 XML
Faithfully Preserve Comments in Parsed XML
我想在操作时尽可能忠实地保留评论 XML。
我设法保留了评论,但内容正在 XML-转义。
#!/usr/bin/env python
# add_host_to_tomcat.py
import xml.etree.ElementTree as ET
from CommentedTreeBuilder import CommentedTreeBuilder
parser = CommentedTreeBuilder()
if __name__ == '__main__':
filename = "/opt/lucee/tomcat/conf/server.xml"
# this is the important part: use the comment-preserving parser
tree = ET.parse(filename, parser)
# get the node to add a child to
engine_node = tree.find("./Service/Engine")
# add a node: Engine.Host
host_node = ET.SubElement(
engine_node,
"Host",
name="local.mysite.com",
appBase="webapps"
)
# add a child to new node: Engine.Host.Context
ET.SubElement(
host_node,
'Context',
path="",
docBase="/path/to/doc/base"
)
tree.write('out.xml')
#!/usr/bin/env python
# CommentedTreeBuilder.py
from xml.etree import ElementTree
class CommentedTreeBuilder ( ElementTree.XMLTreeBuilder ):
def __init__ ( self, html = 0, target = None ):
ElementTree.XMLTreeBuilder.__init__( self, html, target )
self._parser.CommentHandler = self.handle_comment
def handle_comment ( self, data ):
self._target.start( ElementTree.Comment, {} )
self._target.data( data )
self._target.end( ElementTree.Comment )
然而,评论如下:
<!--
EXAMPLE HOST ENTRY:
<Host name="lucee.org" appBase="webapps">
<Context path="" docBase="/var/sites/getrailo.org" />
<Alias>www.lucee.org</Alias>
<Alias>my.lucee.org</Alias>
</Host>
HOST ENTRY TEMPLATE:
<Host name="[ENTER DOMAIN NAME]" appBase="webapps">
<Context path="" docBase="[ENTER SYSTEM PATH]" />
<Alias>[ENTER DOMAIN ALIAS]</Alias>
</Host>
-->
最终结果为:
<!--
EXAMPLE HOST ENTRY:
<Host name="lucee.org" appBase="webapps">
<Context path="" docBase="/var/sites/getrailo.org" />
<Alias>www.lucee.org</Alias>
<Alias>my.lucee.org</Alias>
</Host>
HOST ENTRY TEMPLATE:
<Host name="[ENTER DOMAIN NAME]" appBase="webapps">
<Context path="" docBase="[ENTER SYSTEM PATH]" />
<Alias>[ENTER DOMAIN ALIAS]</Alias>
</Host>
-->
我也在CommentedTreeBuilder.py
中尝试了self._target.data( saxutils.unescape(data) )
,但似乎没有任何作用。事实上,我认为问题发生在 handle_commment()
步骤之后的某处。
顺便说一句,这个问题类似于this。
已使用 Python 2.7 和 3.5 进行测试,以下代码应按预期工作。
#!/usr/bin/env python
# CommentedTreeBuilder.py
from xml.etree import ElementTree
class CommentedTreeBuilder(ElementTree.TreeBuilder):
def comment(self, data):
self.start(ElementTree.Comment, {})
self.data(data)
self.end(ElementTree.Comment)
然后,在主代码中使用
parser = ElementTree.XMLParser(target=CommentedTreeBuilder())
作为解析器而不是当前解析器。
顺便说一下,评论在 lxml
中开箱即用。也就是说,你可以做
import lxml.etree as ET
tree = ET.parse(filename)
无需以上任何一项。
马丁的代码对我不起作用。我修改了以下内容,按预期工作。
import xml.etree.ElementTree as ET
class CommentedTreeBuilder(ET.XMLTreeBuilder):
def __init__(self, *args, **kwargs):
super(CommentedTreeBuilder, self).__init__(*args, **kwargs)
self._parser.CommentHandler = self.comment
def comment(self, data):
self._target.start(ET.Comment, {})
self._target.data(data)
self._target.end(ET.Comment)
这是测试
parser=CommentedTreeBuilder()
tree = ET.parse(filename, parser)
tree.write('out.xml')
看起来@Martin 和@sukhbinder 的答案都不适合我...因此将其作为 python 3.x
上的可行的完整解决方案
from xml.etree import ElementTree
string = '''<?xml version="1.0"?>
<data>
<!--Test
-->
<country name="Liechtenstein">
<rank>1</rank>
<year>2008</year>
<gdppc>141100</gdppc>
<neighbor name="Austria" direction="E"/>
<neighbor name="Switzerland" direction="W"/>
</country>
</data>'''
class CommentedTreeBuilder(ElementTree.TreeBuilder):
def comment(self, data):
self.start(ElementTree.Comment, {})
self.data(data)
self.end(ElementTree.Comment)
parser = ElementTree.XMLParser(target=CommentedTreeBuilder())
tree = ElementTree.fromstring(string, parser)
print(tree.find("./*[0]").text)
# or ElementTree.parse(filename, parser)
Python 3.8 将 insert_comments
参数添加到 TreeBuilder
其中:
class xml.etree.ElementTree.TreeBuilder(element_factory=None, *, comment_factory=None, pi_factory=None, insert_comments=False, insert_pis=False)
When insert_comments and/or insert_pis is true, comments/pis will be inserted into the tree if they appear within the root element (but not outside of it).
示例:
parser = ElementTree.XMLParser(target=ElementTree.TreeBuilder(insert_comments=True))
Martin 的答案是正确的,只是缺少一些代码,
我知道这对更有经验的程序员来说可能是显而易见的,但作为一个新程序员,我花了一分钟时间才明白:
马丁的回答:
import xml.etree.ElementTree as ET
from xml.etree import ElementTree
class CommentedTreeBuilder(ElementTree.TreeBuilder):
# This class will retain remarks and comments opposed to the xml parser default
def comment(self, data):
self.start(ElementTree.Comment, {})
self.data(data)
self.end(ElementTree.Comment)
# the missing part:
def parse_xml_with_remarks(filepath):
ctb = CommentedTreeBuilder()
xp = ET.XMLParser(target=ctb)
tree = ET.parse(filepath, parser=xp)
return tree
# parsing the file, and getting root
tree=parse_xml_with_remarks(file)
root=tree.getroot()
我想在操作时尽可能忠实地保留评论 XML。
我设法保留了评论,但内容正在 XML-转义。
#!/usr/bin/env python
# add_host_to_tomcat.py
import xml.etree.ElementTree as ET
from CommentedTreeBuilder import CommentedTreeBuilder
parser = CommentedTreeBuilder()
if __name__ == '__main__':
filename = "/opt/lucee/tomcat/conf/server.xml"
# this is the important part: use the comment-preserving parser
tree = ET.parse(filename, parser)
# get the node to add a child to
engine_node = tree.find("./Service/Engine")
# add a node: Engine.Host
host_node = ET.SubElement(
engine_node,
"Host",
name="local.mysite.com",
appBase="webapps"
)
# add a child to new node: Engine.Host.Context
ET.SubElement(
host_node,
'Context',
path="",
docBase="/path/to/doc/base"
)
tree.write('out.xml')
#!/usr/bin/env python
# CommentedTreeBuilder.py
from xml.etree import ElementTree
class CommentedTreeBuilder ( ElementTree.XMLTreeBuilder ):
def __init__ ( self, html = 0, target = None ):
ElementTree.XMLTreeBuilder.__init__( self, html, target )
self._parser.CommentHandler = self.handle_comment
def handle_comment ( self, data ):
self._target.start( ElementTree.Comment, {} )
self._target.data( data )
self._target.end( ElementTree.Comment )
然而,评论如下:
<!--
EXAMPLE HOST ENTRY:
<Host name="lucee.org" appBase="webapps">
<Context path="" docBase="/var/sites/getrailo.org" />
<Alias>www.lucee.org</Alias>
<Alias>my.lucee.org</Alias>
</Host>
HOST ENTRY TEMPLATE:
<Host name="[ENTER DOMAIN NAME]" appBase="webapps">
<Context path="" docBase="[ENTER SYSTEM PATH]" />
<Alias>[ENTER DOMAIN ALIAS]</Alias>
</Host>
-->
最终结果为:
<!--
EXAMPLE HOST ENTRY:
<Host name="lucee.org" appBase="webapps">
<Context path="" docBase="/var/sites/getrailo.org" />
<Alias>www.lucee.org</Alias>
<Alias>my.lucee.org</Alias>
</Host>
HOST ENTRY TEMPLATE:
<Host name="[ENTER DOMAIN NAME]" appBase="webapps">
<Context path="" docBase="[ENTER SYSTEM PATH]" />
<Alias>[ENTER DOMAIN ALIAS]</Alias>
</Host>
-->
我也在CommentedTreeBuilder.py
中尝试了self._target.data( saxutils.unescape(data) )
,但似乎没有任何作用。事实上,我认为问题发生在 handle_commment()
步骤之后的某处。
顺便说一句,这个问题类似于this。
已使用 Python 2.7 和 3.5 进行测试,以下代码应按预期工作。
#!/usr/bin/env python
# CommentedTreeBuilder.py
from xml.etree import ElementTree
class CommentedTreeBuilder(ElementTree.TreeBuilder):
def comment(self, data):
self.start(ElementTree.Comment, {})
self.data(data)
self.end(ElementTree.Comment)
然后,在主代码中使用
parser = ElementTree.XMLParser(target=CommentedTreeBuilder())
作为解析器而不是当前解析器。
顺便说一下,评论在 lxml
中开箱即用。也就是说,你可以做
import lxml.etree as ET
tree = ET.parse(filename)
无需以上任何一项。
马丁的代码对我不起作用。我修改了以下内容,按预期工作。
import xml.etree.ElementTree as ET
class CommentedTreeBuilder(ET.XMLTreeBuilder):
def __init__(self, *args, **kwargs):
super(CommentedTreeBuilder, self).__init__(*args, **kwargs)
self._parser.CommentHandler = self.comment
def comment(self, data):
self._target.start(ET.Comment, {})
self._target.data(data)
self._target.end(ET.Comment)
这是测试
parser=CommentedTreeBuilder()
tree = ET.parse(filename, parser)
tree.write('out.xml')
看起来@Martin 和@sukhbinder 的答案都不适合我...因此将其作为 python 3.x
上的可行的完整解决方案from xml.etree import ElementTree
string = '''<?xml version="1.0"?>
<data>
<!--Test
-->
<country name="Liechtenstein">
<rank>1</rank>
<year>2008</year>
<gdppc>141100</gdppc>
<neighbor name="Austria" direction="E"/>
<neighbor name="Switzerland" direction="W"/>
</country>
</data>'''
class CommentedTreeBuilder(ElementTree.TreeBuilder):
def comment(self, data):
self.start(ElementTree.Comment, {})
self.data(data)
self.end(ElementTree.Comment)
parser = ElementTree.XMLParser(target=CommentedTreeBuilder())
tree = ElementTree.fromstring(string, parser)
print(tree.find("./*[0]").text)
# or ElementTree.parse(filename, parser)
Python 3.8 将 insert_comments
参数添加到 TreeBuilder
其中:
class xml.etree.ElementTree.TreeBuilder(element_factory=None, *, comment_factory=None, pi_factory=None, insert_comments=False, insert_pis=False)
When insert_comments and/or insert_pis is true, comments/pis will be inserted into the tree if they appear within the root element (but not outside of it).
示例:
parser = ElementTree.XMLParser(target=ElementTree.TreeBuilder(insert_comments=True))
Martin 的答案是正确的,只是缺少一些代码, 我知道这对更有经验的程序员来说可能是显而易见的,但作为一个新程序员,我花了一分钟时间才明白: 马丁的回答:
import xml.etree.ElementTree as ET
from xml.etree import ElementTree
class CommentedTreeBuilder(ElementTree.TreeBuilder):
# This class will retain remarks and comments opposed to the xml parser default
def comment(self, data):
self.start(ElementTree.Comment, {})
self.data(data)
self.end(ElementTree.Comment)
# the missing part:
def parse_xml_with_remarks(filepath):
ctb = CommentedTreeBuilder()
xp = ET.XMLParser(target=ctb)
tree = ET.parse(filepath, parser=xp)
return tree
# parsing the file, and getting root
tree=parse_xml_with_remarks(file)
root=tree.getroot()