递归搜索父子组合并在 python 和 XML 中构建树
Recursively search for parent child combinations and build tree in python and XML
我正在尝试遍历这个充满父->子关系的 XML 数据,需要一种构建树的方法。任何帮助将不胜感激。另外,在这种情况下,为父-->子关系提供属性或节点更好吗?
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<nodes>
<node name="Car" child="Engine"/>
<node name="Car" child="Wheel"/>
<node name="Engine" child="Piston"/>
<node name="Engine" child="Carb"/>
<node name="Carb" child="Bolt"/>
<node name="Spare Wheel"/>
<node name="Bolt" child="Thread"/>
<node name="Carb" child="Foat"/>
<node name="Truck" child="Engine"/>
<node name="Engine" child="Bolt"/>
<node name="Wheel" child="Hubcap"/>
</nodes>
在 Python 脚本上,这就是我所拥有的。我的大脑被炸了,我无法理解逻辑?请帮忙
import xml.etree.ElementTree as ET
tree = ET.parse('rec.xml')
root = tree.getroot()
def find_node(data,search):
#str = root.find('.//node[@child="1.2.1"]')
for node in data.findall('.//node'):
if node.attrib['name']==search:
print('Child-->', node)
for nodes in root.findall('node'):
parent = nodes.attrib.get('name')
child = nodes.attrib.get('child')
print (parent,'-->', child)
find_node(root,child)
预期的可能输出是这样的(真的不关心排序顺序,只要所有节点项都在树中的某个位置表示即可。
Car --> Engine --> Piston
Car --> Engine --> Carb --> Float
Car --> Engine --> Carb --> Bolt --> Thread
Car --> Wheel --> Hubcaps
Truck --> Engine --> Piston
Truck --> Engine --> Carb --> Bolt --> Thread
Truck --> Loading Bin
Spare Wheel -->
我已经有很长时间没有用图表做任何事情了,但这应该非常接近,它不是最佳方法:
x = """<?xml version="1.0"?>
<nodes>
<node name="Car" child="Engine"></node>
<node name="Engine" child="Piston"></node>
<node name="Engine" child="Carb"></node>
<node name="Car" child="Wheel"></node>
<node name="Wheel" child="Hubcaps"></node>
<node name="Truck" child="Engine"></node>
<node name="Truck" child="Loading Bin"></node>
<nested>
<node name="Spare Wheel" child="Engine"></node>
</nested>
<node name="Spare Wheel" child=""></node>
</nodes>"""
from lxml import etree
xml = etree.fromstring(x)
graph = {}
nodes = set()
for x in xml.xpath("//node"):
par, child = x.xpath(".//@name")[0], x.xpath(".//@child")[0]
graph.setdefault(par, set())
graph[par].add(child)
nodes.update([child, par])
def find_all_paths(graph, start, end, path=None):
if path is None:
path = []
path = path + [start]
if start == end:
yield path
for node in graph.get(start, []):
if node not in path:
for new_path in find_all_paths(graph, node, end, path):
yield new_path
for n in graph:
for e in nodes:
if n != e:
for path in find_all_paths(graph, n, e):
if path:
print("--> ".join(path))
更新后的输入会给你:
Engine--> Carb
Engine--> Piston
Car--> Engine
Car--> Wheel
Car--> Wheel--> Hubcaps
Car--> Engine--> Carb
Car--> Engine--> Piston
Spare Wheel--> Engine
Spare Wheel-->
Spare Wheel--> Engine--> Carb
Spare Wheel--> Engine--> Piston
Wheel--> Hubcaps
Truck--> Engine
Truck--> Engine--> Carb
Truck--> Engine--> Piston
Truck--> Loading Bin
rec.xml:
<?xml version="1.0"?>
<nodes>
<node name="Car" child="Engine"></node>
<node name="Engine" child="Piston"></node>
<node name="Engine" child="Carb"></node>
<node name="Car" child="Wheel"></node>
<node name="Wheel" child="Hubcaps"></node>
<node name="Truck" child="Engine"></node>
<node name="Truck" child="Loading Bin"></node>
<node name="Piston" child="Loa"></node>
<node name="Piston" child="Loaqq"></node>
<node name="Piston" child="Loaww"></node>
<node name="Loaww" child="Loawwqqqqq"></node>
<node name="Spare Wheel" child=""></node>
</nodes>
parse.py:-
import xml.etree.ElementTree as ET
tree = ET.parse('rec.xml')
root = tree.getroot()
data = {}
child_list = []
def recursive_print(string,x):
if x in data.keys():
for x_child in data[x]:
if x_child in data.keys():
recursive_print(string+'-------->'+x_child,x_child)
else:
print string+'-------->'+x_child
else:
print string
for nodes in root.findall('node'):
parent = nodes.attrib.get('name')
child = nodes.attrib.get('child')
child_list.append(child)
if parent not in data.keys():
data[parent] = []
data[parent].append(child)
for key in data.keys():
if key not in child_list:
for x in data[key]:
string = key+'------->'+x
recursive_print(string,x)
输出:-
Spare Wheel------->
Car------->Engine-------->Piston-------->Loa
Car------->Engine-------->Piston-------->Loaqq
Car------->Engine-------->Piston-------->Loaww-------->Loawwqqqqq
Car------->Engine-------->Carb
Car------->Wheel-------->Hubcaps
Truck------->Engine-------->Piston-------->Loa
Truck------->Engine-------->Piston-------->Loaqq
Truck------->Engine-------->Piston-------->Loaww-------->Loawwqqqqq
Truck------->Engine-------->Carb
Truck------->Loading Bin
rec.xml:-
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<nodes>
<node name="Car" child="Engine"/>
<node name="Car" child="Wheel"/>
<node name="Engine" child="Piston"/>
<node name="Engine" child="Carb"/>
<node name="Carb" child="Bolt"/>
<node name="Spare Wheel"/>
<node name="Bolt" child="Thread"/>
<node name="Carb" child="Foat"/>
<node name="Truck" child="Engine"/>
<node name="Engine" child="Bolt"/>
<node name="Wheel" child="Hubcap"/>
</nodes>
Pars.py
import xml.etree.ElementTree as ET
tree = ET.parse('rec.xml')
root = tree.getroot()
tree = []
tree1 = []
uses_list = []
class Tree(object):
def __init__(self):
self.child = []
self.data = None
def tree_print(parent,x):
string = x.data
if x.data is not None:
if len(x.child) == 0:
print parent,x.data
if len(x.child) > 0:
for x1 in x.child:
tree_print(parent+x.data+'----------->',x1)
for nodes in root.findall('node'):
parent = nodes.attrib.get('name')
child = nodes.attrib.get('child')
uses_list.append(child)
if parent not in uses_list:
tree_root = Tree()
tree_root.data = parent
if child is not None:
child_obj = Tree()
child_obj.data = child
tree_root.child.append(child_obj)
tree.append(child_obj)
tree1.append(tree_root)
tree.append(tree_root)
else:
for tree_root in tree:
for child_o in tree_root.child:
if parent == child_o.data:
if child is not None:
child_obj = Tree()
child_obj.data = child
child_o.child.append(child_obj)
tree.append(child_obj)
for x in tree1:
tree_print('',x)
输出:-
Car----------->Engine-----------> Piston
Car----------->Engine----------->Carb----------->Bolt-----------> Thread
Car----------->Engine----------->Carb-----------> Foat
Car----------->Engine-----------> Bolt
Car----------->Wheel-----------> Hubcap
Spare Wheel
Truck----------->Engine-----------> Bolt
这是一个纯 XSLT 解决方案 -- 有效地使用键(相当于哈希表)并且只有 23 行 -- 迄今为止最短的解决方案。
这也是计算上最简单的一个——比较嵌套级别 1 和嵌套级别 4 - 5 ...
这个解决方案是尾递归,这意味着任何好的 XSLT 处理器都会通过迭代对其进行优化,从而避免堆栈溢出的可能性,因为最大调用堆栈深度保持不变(1).
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="text"/>
<xsl:key name="kNodeByChild" match="node" use="@child"/>
<xsl:key name="kNodeByName" match="node" use="@name"/>
<xsl:template match="/*">
<xsl:apply-templates select="node[not(key('kNodeByChild', @name))]"/>
</xsl:template>
<xsl:template match="node[not(key('kNodeByName', @child))]">
<xsl:param name="pParentPath"/>
<xsl:value-of select="concat($pParentPath, @name, ' ---> ', @child, '
')"/>
</xsl:template>
<xsl:template match="node">
<xsl:param name="pParentPath"/>
<xsl:apply-templates select="key('kNodeByName', @child)">
<xsl:with-param name="pParentPath" select="concat($pParentPath, @name, ' ---> ')"/>
</xsl:apply-templates>
</xsl:template>
</xsl:stylesheet>
当此转换应用于提供的 XML 文档时:
<nodes>
<node name="Car" child="Engine"/>
<node name="Car" child="Wheel"/>
<node name="Engine" child="Piston"/>
<node name="Engine" child="Carb"/>
<node name="Carb" child="Bolt"/>
<node name="Spare Wheel"/>
<node name="Bolt" child="Thread"/>
<node name="Carb" child="Foat"/>
<node name="Truck" child="Engine"/>
<node name="Engine" child="Bolt"/>
<node name="Wheel" child="Hubcap"/>
</nodes>
产生了想要的、正确的结果:
Car ---> Engine ---> Piston
Car ---> Engine ---> Carb ---> Bolt ---> Thread
Car ---> Engine ---> Carb ---> Foat
Car ---> Engine ---> Bolt ---> Thread
Car ---> Wheel ---> Hubcap
Spare Wheel --->
Truck ---> Engine ---> Piston
Truck ---> Engine ---> Carb ---> Bolt ---> Thread
Truck ---> Engine ---> Carb ---> Foat
Truck ---> Engine ---> Bolt ---> Thread
我正在尝试遍历这个充满父->子关系的 XML 数据,需要一种构建树的方法。任何帮助将不胜感激。另外,在这种情况下,为父-->子关系提供属性或节点更好吗?
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<nodes>
<node name="Car" child="Engine"/>
<node name="Car" child="Wheel"/>
<node name="Engine" child="Piston"/>
<node name="Engine" child="Carb"/>
<node name="Carb" child="Bolt"/>
<node name="Spare Wheel"/>
<node name="Bolt" child="Thread"/>
<node name="Carb" child="Foat"/>
<node name="Truck" child="Engine"/>
<node name="Engine" child="Bolt"/>
<node name="Wheel" child="Hubcap"/>
</nodes>
在 Python 脚本上,这就是我所拥有的。我的大脑被炸了,我无法理解逻辑?请帮忙
import xml.etree.ElementTree as ET
tree = ET.parse('rec.xml')
root = tree.getroot()
def find_node(data,search):
#str = root.find('.//node[@child="1.2.1"]')
for node in data.findall('.//node'):
if node.attrib['name']==search:
print('Child-->', node)
for nodes in root.findall('node'):
parent = nodes.attrib.get('name')
child = nodes.attrib.get('child')
print (parent,'-->', child)
find_node(root,child)
预期的可能输出是这样的(真的不关心排序顺序,只要所有节点项都在树中的某个位置表示即可。
Car --> Engine --> Piston
Car --> Engine --> Carb --> Float
Car --> Engine --> Carb --> Bolt --> Thread
Car --> Wheel --> Hubcaps
Truck --> Engine --> Piston
Truck --> Engine --> Carb --> Bolt --> Thread
Truck --> Loading Bin
Spare Wheel -->
我已经有很长时间没有用图表做任何事情了,但这应该非常接近,它不是最佳方法:
x = """<?xml version="1.0"?>
<nodes>
<node name="Car" child="Engine"></node>
<node name="Engine" child="Piston"></node>
<node name="Engine" child="Carb"></node>
<node name="Car" child="Wheel"></node>
<node name="Wheel" child="Hubcaps"></node>
<node name="Truck" child="Engine"></node>
<node name="Truck" child="Loading Bin"></node>
<nested>
<node name="Spare Wheel" child="Engine"></node>
</nested>
<node name="Spare Wheel" child=""></node>
</nodes>"""
from lxml import etree
xml = etree.fromstring(x)
graph = {}
nodes = set()
for x in xml.xpath("//node"):
par, child = x.xpath(".//@name")[0], x.xpath(".//@child")[0]
graph.setdefault(par, set())
graph[par].add(child)
nodes.update([child, par])
def find_all_paths(graph, start, end, path=None):
if path is None:
path = []
path = path + [start]
if start == end:
yield path
for node in graph.get(start, []):
if node not in path:
for new_path in find_all_paths(graph, node, end, path):
yield new_path
for n in graph:
for e in nodes:
if n != e:
for path in find_all_paths(graph, n, e):
if path:
print("--> ".join(path))
更新后的输入会给你:
Engine--> Carb
Engine--> Piston
Car--> Engine
Car--> Wheel
Car--> Wheel--> Hubcaps
Car--> Engine--> Carb
Car--> Engine--> Piston
Spare Wheel--> Engine
Spare Wheel-->
Spare Wheel--> Engine--> Carb
Spare Wheel--> Engine--> Piston
Wheel--> Hubcaps
Truck--> Engine
Truck--> Engine--> Carb
Truck--> Engine--> Piston
Truck--> Loading Bin
rec.xml:
<?xml version="1.0"?>
<nodes>
<node name="Car" child="Engine"></node>
<node name="Engine" child="Piston"></node>
<node name="Engine" child="Carb"></node>
<node name="Car" child="Wheel"></node>
<node name="Wheel" child="Hubcaps"></node>
<node name="Truck" child="Engine"></node>
<node name="Truck" child="Loading Bin"></node>
<node name="Piston" child="Loa"></node>
<node name="Piston" child="Loaqq"></node>
<node name="Piston" child="Loaww"></node>
<node name="Loaww" child="Loawwqqqqq"></node>
<node name="Spare Wheel" child=""></node>
</nodes>
parse.py:-
import xml.etree.ElementTree as ET
tree = ET.parse('rec.xml')
root = tree.getroot()
data = {}
child_list = []
def recursive_print(string,x):
if x in data.keys():
for x_child in data[x]:
if x_child in data.keys():
recursive_print(string+'-------->'+x_child,x_child)
else:
print string+'-------->'+x_child
else:
print string
for nodes in root.findall('node'):
parent = nodes.attrib.get('name')
child = nodes.attrib.get('child')
child_list.append(child)
if parent not in data.keys():
data[parent] = []
data[parent].append(child)
for key in data.keys():
if key not in child_list:
for x in data[key]:
string = key+'------->'+x
recursive_print(string,x)
输出:-
Spare Wheel------->
Car------->Engine-------->Piston-------->Loa
Car------->Engine-------->Piston-------->Loaqq
Car------->Engine-------->Piston-------->Loaww-------->Loawwqqqqq
Car------->Engine-------->Carb
Car------->Wheel-------->Hubcaps
Truck------->Engine-------->Piston-------->Loa
Truck------->Engine-------->Piston-------->Loaqq
Truck------->Engine-------->Piston-------->Loaww-------->Loawwqqqqq
Truck------->Engine-------->Carb
Truck------->Loading Bin
rec.xml:-
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<nodes>
<node name="Car" child="Engine"/>
<node name="Car" child="Wheel"/>
<node name="Engine" child="Piston"/>
<node name="Engine" child="Carb"/>
<node name="Carb" child="Bolt"/>
<node name="Spare Wheel"/>
<node name="Bolt" child="Thread"/>
<node name="Carb" child="Foat"/>
<node name="Truck" child="Engine"/>
<node name="Engine" child="Bolt"/>
<node name="Wheel" child="Hubcap"/>
</nodes>
Pars.py
import xml.etree.ElementTree as ET
tree = ET.parse('rec.xml')
root = tree.getroot()
tree = []
tree1 = []
uses_list = []
class Tree(object):
def __init__(self):
self.child = []
self.data = None
def tree_print(parent,x):
string = x.data
if x.data is not None:
if len(x.child) == 0:
print parent,x.data
if len(x.child) > 0:
for x1 in x.child:
tree_print(parent+x.data+'----------->',x1)
for nodes in root.findall('node'):
parent = nodes.attrib.get('name')
child = nodes.attrib.get('child')
uses_list.append(child)
if parent not in uses_list:
tree_root = Tree()
tree_root.data = parent
if child is not None:
child_obj = Tree()
child_obj.data = child
tree_root.child.append(child_obj)
tree.append(child_obj)
tree1.append(tree_root)
tree.append(tree_root)
else:
for tree_root in tree:
for child_o in tree_root.child:
if parent == child_o.data:
if child is not None:
child_obj = Tree()
child_obj.data = child
child_o.child.append(child_obj)
tree.append(child_obj)
for x in tree1:
tree_print('',x)
输出:-
Car----------->Engine-----------> Piston
Car----------->Engine----------->Carb----------->Bolt-----------> Thread
Car----------->Engine----------->Carb-----------> Foat
Car----------->Engine-----------> Bolt
Car----------->Wheel-----------> Hubcap
Spare Wheel
Truck----------->Engine-----------> Bolt
这是一个纯 XSLT 解决方案 -- 有效地使用键(相当于哈希表)并且只有 23 行 -- 迄今为止最短的解决方案。
这也是计算上最简单的一个——比较嵌套级别 1 和嵌套级别 4 - 5 ...
这个解决方案是尾递归,这意味着任何好的 XSLT 处理器都会通过迭代对其进行优化,从而避免堆栈溢出的可能性,因为最大调用堆栈深度保持不变(1).
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="text"/>
<xsl:key name="kNodeByChild" match="node" use="@child"/>
<xsl:key name="kNodeByName" match="node" use="@name"/>
<xsl:template match="/*">
<xsl:apply-templates select="node[not(key('kNodeByChild', @name))]"/>
</xsl:template>
<xsl:template match="node[not(key('kNodeByName', @child))]">
<xsl:param name="pParentPath"/>
<xsl:value-of select="concat($pParentPath, @name, ' ---> ', @child, '
')"/>
</xsl:template>
<xsl:template match="node">
<xsl:param name="pParentPath"/>
<xsl:apply-templates select="key('kNodeByName', @child)">
<xsl:with-param name="pParentPath" select="concat($pParentPath, @name, ' ---> ')"/>
</xsl:apply-templates>
</xsl:template>
</xsl:stylesheet>
当此转换应用于提供的 XML 文档时:
<nodes>
<node name="Car" child="Engine"/>
<node name="Car" child="Wheel"/>
<node name="Engine" child="Piston"/>
<node name="Engine" child="Carb"/>
<node name="Carb" child="Bolt"/>
<node name="Spare Wheel"/>
<node name="Bolt" child="Thread"/>
<node name="Carb" child="Foat"/>
<node name="Truck" child="Engine"/>
<node name="Engine" child="Bolt"/>
<node name="Wheel" child="Hubcap"/>
</nodes>
产生了想要的、正确的结果:
Car ---> Engine ---> Piston
Car ---> Engine ---> Carb ---> Bolt ---> Thread
Car ---> Engine ---> Carb ---> Foat
Car ---> Engine ---> Bolt ---> Thread
Car ---> Wheel ---> Hubcap
Spare Wheel --->
Truck ---> Engine ---> Piston
Truck ---> Engine ---> Carb ---> Bolt ---> Thread
Truck ---> Engine ---> Carb ---> Foat
Truck ---> Engine ---> Bolt ---> Thread