获取 XML 树中所有节点的 xpath 属性 - Python
Get xpath of all nodes in XML tree with attributes - Python
假设我有以下 test.xml:
<?xml version="1.0" encoding="UTF-8"?>
<test:myXML xmlns:test="http://com/my/namespace" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<Parent>
<FirstNode name="FirstNodeName"></FirstNode>
<Child1>Test from Child1</Child1>
<SecondNode name="SecondNodeName" type="SecondNodeType">
<Child2>
<GrandChild>Test from GrandChild</GrandChild>
</Child2>
</SecondNode>
</Parent>
</test:myXML>
我想遍历整个树,并获取每个节点的路径,包括属性。我能够遍历树并检索到每个节点的路径,如下所示:
from lxml import etree
xmlDoc = etree.parse("test.xml")
root = xmlDoc.getroot()
for node in xmlDoc.iter():
print("path: ", xmlDoc.getpath(node))
正如预期的那样,打印出:
path: /test:myXML
path: /test:myXML/Parent
path: /test:myXML/Parent/FirstNode
path: /test:myXML/Parent/Child1
path: /test:myXML/Parent/SecondNode
path: /test:myXML/Parent/SecondNode/Child2
path: /test:myXML/Parent/SecondNode/Child2/GrandChild
但是,正如我提到的,我想以某种方式打印所述节点及其父节点的属性及其路径。例如,如果我想打印元素 "Child2",那么我希望同时显示其每个父元素的属性。类似于:
path: /test:myXML/Parent/SecondNode{name="SecondNodeName" type="SecondNodeType"}/Child2
这可能吗?我不太关心根元素的名称空间,如果这样更容易的话。
我不知道有任何预先打包的方法可以做到这一点,但是随着所有强制性 "working from home" 的进行,我想我不妨尝试想出一些办法。它不够优雅,但似乎可以胜任...
在你的实际代码上试试这个,看看它是否有效:
att = """
<test:myXML xmlns:test="http://com/my/namespace" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<Parent>
<FirstNode name="FirstNodeName"></FirstNode>
<Child1>Test from Child1</Child1>
<SecondNode name="SecondNodeName" type="SecondNodeType">
<Child2>
<GrandChild>Test from GrandChild</GrandChild>
</Child2>
</SecondNode>
</Parent>
</test:myXML>
"""
from lxml import etree
bef = []
xps = []
xmlDoc = etree.fromstring(att)
root = etree.ElementTree(xmlDoc)
for node in xmlDoc.iter():
ats = "{"
for a in range(len(node.keys())):
mystr = node.keys()[a]+'="'+node.values()[a]+'" '
ats +=mystr
ats+='}'
xp = root.getpath(node)
bef.append(xp)
ent = ''
if len(ats)>2:
ent+=xp
ent+=ats.replace(' }','}')
else:
ent+=xp
xps.append(ent)
for b, f in zip(bef,xps):
prev = bef.index(b)-1
if prev >=0:
cur = b.rsplit("/",1)[0]
new_cur = f.rsplit("/",1)[1]
if bef[prev]==cur:
new_f = xps[prev]+'/'+new_cur
xps[prev+1]=new_f
print(new_f)
else:
print(f)
输出:
/test:myXML/Parent
/test:myXML/Parent/FirstNode{name="FirstNodeName"}
/test:myXML/Parent/Child1
/test:myXML/Parent/SecondNode{name="SecondNodeName" type="SecondNodeType"}
/test:myXML/Parent/SecondNode{name="SecondNodeName" type="SecondNodeType"}/Child2
/test:myXML/Parent/SecondNode{name="SecondNodeName" type="SecondNodeType"}/Child2/GrandChild
如果它有效并且您有兴趣,我可以尝试解释这一切的作用...
假设我有以下 test.xml:
<?xml version="1.0" encoding="UTF-8"?>
<test:myXML xmlns:test="http://com/my/namespace" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<Parent>
<FirstNode name="FirstNodeName"></FirstNode>
<Child1>Test from Child1</Child1>
<SecondNode name="SecondNodeName" type="SecondNodeType">
<Child2>
<GrandChild>Test from GrandChild</GrandChild>
</Child2>
</SecondNode>
</Parent>
</test:myXML>
我想遍历整个树,并获取每个节点的路径,包括属性。我能够遍历树并检索到每个节点的路径,如下所示:
from lxml import etree
xmlDoc = etree.parse("test.xml")
root = xmlDoc.getroot()
for node in xmlDoc.iter():
print("path: ", xmlDoc.getpath(node))
正如预期的那样,打印出:
path: /test:myXML
path: /test:myXML/Parent
path: /test:myXML/Parent/FirstNode
path: /test:myXML/Parent/Child1
path: /test:myXML/Parent/SecondNode
path: /test:myXML/Parent/SecondNode/Child2
path: /test:myXML/Parent/SecondNode/Child2/GrandChild
但是,正如我提到的,我想以某种方式打印所述节点及其父节点的属性及其路径。例如,如果我想打印元素 "Child2",那么我希望同时显示其每个父元素的属性。类似于:
path: /test:myXML/Parent/SecondNode{name="SecondNodeName" type="SecondNodeType"}/Child2
这可能吗?我不太关心根元素的名称空间,如果这样更容易的话。
我不知道有任何预先打包的方法可以做到这一点,但是随着所有强制性 "working from home" 的进行,我想我不妨尝试想出一些办法。它不够优雅,但似乎可以胜任...
在你的实际代码上试试这个,看看它是否有效:
att = """
<test:myXML xmlns:test="http://com/my/namespace" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<Parent>
<FirstNode name="FirstNodeName"></FirstNode>
<Child1>Test from Child1</Child1>
<SecondNode name="SecondNodeName" type="SecondNodeType">
<Child2>
<GrandChild>Test from GrandChild</GrandChild>
</Child2>
</SecondNode>
</Parent>
</test:myXML>
"""
from lxml import etree
bef = []
xps = []
xmlDoc = etree.fromstring(att)
root = etree.ElementTree(xmlDoc)
for node in xmlDoc.iter():
ats = "{"
for a in range(len(node.keys())):
mystr = node.keys()[a]+'="'+node.values()[a]+'" '
ats +=mystr
ats+='}'
xp = root.getpath(node)
bef.append(xp)
ent = ''
if len(ats)>2:
ent+=xp
ent+=ats.replace(' }','}')
else:
ent+=xp
xps.append(ent)
for b, f in zip(bef,xps):
prev = bef.index(b)-1
if prev >=0:
cur = b.rsplit("/",1)[0]
new_cur = f.rsplit("/",1)[1]
if bef[prev]==cur:
new_f = xps[prev]+'/'+new_cur
xps[prev+1]=new_f
print(new_f)
else:
print(f)
输出:
/test:myXML/Parent
/test:myXML/Parent/FirstNode{name="FirstNodeName"}
/test:myXML/Parent/Child1
/test:myXML/Parent/SecondNode{name="SecondNodeName" type="SecondNodeType"}
/test:myXML/Parent/SecondNode{name="SecondNodeName" type="SecondNodeType"}/Child2
/test:myXML/Parent/SecondNode{name="SecondNodeName" type="SecondNodeType"}/Child2/GrandChild
如果它有效并且您有兴趣,我可以尝试解释这一切的作用...