使用 lxml 访问处理指令 before/after 根元素
Access the processing-instructions before/after a root element with lxml
使用lxml,如何access/iterate位于根打开标签之前或根关闭标签之后的处理指令?
我试过了,但是,根据文档,它只在根元素内迭代:
import io
from lxml import etree
content = """\
<?before1?>
<?before2?>
<root>text</root>
<?after1?>
<?after2?>
"""
source = etree.parse(io.StringIO(content))
print(etree.tostring(source, encoding="unicode"))
# -> <?before1?><?before2?><root>text</root><?after1?><?after2?>
for node in source.iter():
print(type(node))
# -> <class 'lxml.etree._Element'>
我唯一的解决办法是用虚拟元素包裹 XML:
dummy_content = "<dummy>{}</dummy>".format(etree.tostring(source, encoding="unicode"))
dummy = etree.parse((io.StringIO(dummy_content)))
for node in dummy.iter():
print(type(node))
# -> <class 'lxml.etree._Element'>
# <class 'lxml.etree._ProcessingInstruction'>
# <class 'lxml.etree._ProcessingInstruction'>
# <class 'lxml.etree._Element'>
# <class 'lxml.etree._ProcessingInstruction'>
# <class 'lxml.etree._ProcessingInstruction'>
有更好的解决方案吗?
您可以在根元素上使用 getprevious()
和 getnext()
方法。
before2 = source.getroot().getprevious()
before1 = before2.getprevious()
after1 = source.getroot().getnext()
after2 = after1.getnext()
参见 https://lxml.de/api/lxml.etree._Element-class.html。
也可以使用 XPath(在 ElementTree
或 Element
实例上):
before = source.xpath("preceding-sibling::node()") # List of two PIs
after = source.xpath("following-sibling::node()")
使用lxml,如何access/iterate位于根打开标签之前或根关闭标签之后的处理指令?
我试过了,但是,根据文档,它只在根元素内迭代:
import io
from lxml import etree
content = """\
<?before1?>
<?before2?>
<root>text</root>
<?after1?>
<?after2?>
"""
source = etree.parse(io.StringIO(content))
print(etree.tostring(source, encoding="unicode"))
# -> <?before1?><?before2?><root>text</root><?after1?><?after2?>
for node in source.iter():
print(type(node))
# -> <class 'lxml.etree._Element'>
我唯一的解决办法是用虚拟元素包裹 XML:
dummy_content = "<dummy>{}</dummy>".format(etree.tostring(source, encoding="unicode"))
dummy = etree.parse((io.StringIO(dummy_content)))
for node in dummy.iter():
print(type(node))
# -> <class 'lxml.etree._Element'>
# <class 'lxml.etree._ProcessingInstruction'>
# <class 'lxml.etree._ProcessingInstruction'>
# <class 'lxml.etree._Element'>
# <class 'lxml.etree._ProcessingInstruction'>
# <class 'lxml.etree._ProcessingInstruction'>
有更好的解决方案吗?
您可以在根元素上使用 getprevious()
和 getnext()
方法。
before2 = source.getroot().getprevious()
before1 = before2.getprevious()
after1 = source.getroot().getnext()
after2 = after1.getnext()
参见 https://lxml.de/api/lxml.etree._Element-class.html。
也可以使用 XPath(在 ElementTree
或 Element
实例上):
before = source.xpath("preceding-sibling::node()") # List of two PIs
after = source.xpath("following-sibling::node()")