在 xml 结构中查找层次结构
find hierarchy in xml structure
我试图通过使用 Python 中的 ElementTree 在我的 XML 代码中查找 ToolID 元素和子元素的层次结构:
<Node ToolID="19">
<GuiSettings Plugin="AlteryxGuiToolkit.ToolContainer.ToolContainer">
<Properties>
<ChildNodes>
<Node ToolID="11">
<Node ToolID="16">
<GuiSettings Plugin="AlteryxGuiToolkit.ToolContainer.ToolContainer">
<Properties>
<ChildNodes>
<Node ToolID="17">
<GuiSettings Plugin="AlteryxGuiToolkit.ToolContainer.ToolContainer">
<Properties>
<ChildNodes>
<Node ToolID="2">
<GuiSettings Plugin="AlteryxBasePluginsGui.DbFileInput.DbFileInput">
<Properties>
<EngineSettings EngineDll="AlteryxBasePluginsEngine.dll" EngineDllEntryPoint="AlteryxDbFileInput" />
</Node>
</ChildNodes>
</Node>
<Node ToolID="18">
</ChildNodes>
</Node>
<Node ToolID="13">
<Node ToolID="20">
</ChildNodes>
</Node>
</Nodes>
ToolID 的预期输出如下所示:
{10:-}, {19:11, 16, 13, 20}, {16:17, 18}, {17:2}, {2:-}, {11:-}, { 18:-}, {13:-}, {20:-}
我找到方法了。
- 查找所有 ToolID
tool_id_list = []
for elm in root.findall(".//Node"):
try:
tool_id_list.append(elm.attrib["ToolID"])
except:
pass
- 遍历所有可用 ToolID 的列表并搜索子节点:
node_hierarchy = dict((el,[]) for el in tool_id_list)
def find_child_tools(parent_id_list):
for parent_id in parent_id_list:
node = (".//Node[@ToolID='%s']/") % (parent_id)
for elm in root.findall(node):
for elm2 in elm.findall("./Node"):
child_tool = elm2.attrib["ToolID"]
node_hierarchy[parent_id].append(child_tool)
find_child_tools(tool_id_list)
node_hierarchy
输出:
{'10': [],
'19': ['11', '16', '13', '20'],
'11': ['8'],
'8': [],
'16': ['17', '18'],
'17': ['2'],
'2': [],
'18': ['4'],
'4': [],
'13': ['12', '14', '15'],
'12': [],
'14': [],
'15': [],
'20': []}
如果您可以从 ElementTree 更改为 lxml(以获得更好的 XPath 支持),您似乎可以将其简化为如下所示...
Python
from lxml import etree
tree = etree.parse("input.xml")
nodes = {}
for node in tree.xpath("descendant-or-self::Node"): # .//Node was not getting the first Node if it was the root element.
nodes[node.get("ToolID")] = [child.get("ToolID") for child in node.xpath("./ChildNodes/Node")]
print(nodes)
输入XML(我试着让你的样本XML格式正确。希望结构仍然正确。)
<Node ToolID="19">
<GuiSettings Plugin="AlteryxGuiToolkit.ToolContainer.ToolContainer"/>
<Properties/>
<ChildNodes>
<Node ToolID="11"/>
<Node ToolID="16">
<GuiSettings Plugin="AlteryxGuiToolkit.ToolContainer.ToolContainer"/>
<Properties/>
<ChildNodes>
<Node ToolID="17">
<GuiSettings Plugin="AlteryxGuiToolkit.ToolContainer.ToolContainer"/>
<Properties/>
<ChildNodes>
<Node ToolID="2">
<GuiSettings Plugin="AlteryxBasePluginsGui.DbFileInput.DbFileInput"/>
<Properties/>
<EngineSettings EngineDll="AlteryxBasePluginsEngine.dll" EngineDllEntryPoint="AlteryxDbFileInput"/>
</Node>
</ChildNodes>
</Node>
<Node ToolID="18"/>
</ChildNodes>
</Node>
<Node ToolID="13"/>
<Node ToolID="20"/>
</ChildNodes>
</Node>
打印输出
{'19': ['11', '16', '13', '20'],
'11': [],
'16': ['17', '18'],
'17': ['2'],
'2': [],
'18': [],
'13': [],
'20': []}
如果您的实际 XML 有一个不是 Node
的根元素,您仍然可以使用 ElementTree...
for node in tree.findall(".//Node"):
nodes[node.get("ToolID")] = [child.get("ToolID") for child in node.findall("./ChildNodes/Node")]
我试图通过使用 Python 中的 ElementTree 在我的 XML 代码中查找 ToolID 元素和子元素的层次结构:
<Node ToolID="19">
<GuiSettings Plugin="AlteryxGuiToolkit.ToolContainer.ToolContainer">
<Properties>
<ChildNodes>
<Node ToolID="11">
<Node ToolID="16">
<GuiSettings Plugin="AlteryxGuiToolkit.ToolContainer.ToolContainer">
<Properties>
<ChildNodes>
<Node ToolID="17">
<GuiSettings Plugin="AlteryxGuiToolkit.ToolContainer.ToolContainer">
<Properties>
<ChildNodes>
<Node ToolID="2">
<GuiSettings Plugin="AlteryxBasePluginsGui.DbFileInput.DbFileInput">
<Properties>
<EngineSettings EngineDll="AlteryxBasePluginsEngine.dll" EngineDllEntryPoint="AlteryxDbFileInput" />
</Node>
</ChildNodes>
</Node>
<Node ToolID="18">
</ChildNodes>
</Node>
<Node ToolID="13">
<Node ToolID="20">
</ChildNodes>
</Node>
</Nodes>
ToolID 的预期输出如下所示: {10:-}, {19:11, 16, 13, 20}, {16:17, 18}, {17:2}, {2:-}, {11:-}, { 18:-}, {13:-}, {20:-}
我找到方法了。
- 查找所有 ToolID
tool_id_list = [] for elm in root.findall(".//Node"): try: tool_id_list.append(elm.attrib["ToolID"]) except: pass
- 遍历所有可用 ToolID 的列表并搜索子节点:
node_hierarchy = dict((el,[]) for el in tool_id_list)
def find_child_tools(parent_id_list):
for parent_id in parent_id_list:
node = (".//Node[@ToolID='%s']/") % (parent_id)
for elm in root.findall(node):
for elm2 in elm.findall("./Node"):
child_tool = elm2.attrib["ToolID"]
node_hierarchy[parent_id].append(child_tool)
find_child_tools(tool_id_list)
node_hierarchy
输出:
{'10': [],
'19': ['11', '16', '13', '20'],
'11': ['8'],
'8': [],
'16': ['17', '18'],
'17': ['2'],
'2': [],
'18': ['4'],
'4': [],
'13': ['12', '14', '15'],
'12': [],
'14': [],
'15': [],
'20': []}
如果您可以从 ElementTree 更改为 lxml(以获得更好的 XPath 支持),您似乎可以将其简化为如下所示...
Python
from lxml import etree
tree = etree.parse("input.xml")
nodes = {}
for node in tree.xpath("descendant-or-self::Node"): # .//Node was not getting the first Node if it was the root element.
nodes[node.get("ToolID")] = [child.get("ToolID") for child in node.xpath("./ChildNodes/Node")]
print(nodes)
输入XML(我试着让你的样本XML格式正确。希望结构仍然正确。)
<Node ToolID="19">
<GuiSettings Plugin="AlteryxGuiToolkit.ToolContainer.ToolContainer"/>
<Properties/>
<ChildNodes>
<Node ToolID="11"/>
<Node ToolID="16">
<GuiSettings Plugin="AlteryxGuiToolkit.ToolContainer.ToolContainer"/>
<Properties/>
<ChildNodes>
<Node ToolID="17">
<GuiSettings Plugin="AlteryxGuiToolkit.ToolContainer.ToolContainer"/>
<Properties/>
<ChildNodes>
<Node ToolID="2">
<GuiSettings Plugin="AlteryxBasePluginsGui.DbFileInput.DbFileInput"/>
<Properties/>
<EngineSettings EngineDll="AlteryxBasePluginsEngine.dll" EngineDllEntryPoint="AlteryxDbFileInput"/>
</Node>
</ChildNodes>
</Node>
<Node ToolID="18"/>
</ChildNodes>
</Node>
<Node ToolID="13"/>
<Node ToolID="20"/>
</ChildNodes>
</Node>
打印输出
{'19': ['11', '16', '13', '20'],
'11': [],
'16': ['17', '18'],
'17': ['2'],
'2': [],
'18': [],
'13': [],
'20': []}
如果您的实际 XML 有一个不是 Node
的根元素,您仍然可以使用 ElementTree...
for node in tree.findall(".//Node"):
nodes[node.get("ToolID")] = [child.get("ToolID") for child in node.findall("./ChildNodes/Node")]