在 xml 结构中查找层次结构

find hierarchy in xml structure

我试图通过使用 Python 中的 ElementTree 在我的 XML 代码中查找 ToolID 元素和子元素的层次结构:

<Node ToolID="19">
  <GuiSettings Plugin="AlteryxGuiToolkit.ToolContainer.ToolContainer">
  <Properties>
  <ChildNodes>
    <Node ToolID="11">
    <Node ToolID="16">
      <GuiSettings Plugin="AlteryxGuiToolkit.ToolContainer.ToolContainer">
      <Properties>
      <ChildNodes>
        <Node ToolID="17">
          <GuiSettings Plugin="AlteryxGuiToolkit.ToolContainer.ToolContainer">
          <Properties>
          <ChildNodes>
            <Node ToolID="2">
              <GuiSettings Plugin="AlteryxBasePluginsGui.DbFileInput.DbFileInput">
              <Properties>
              <EngineSettings EngineDll="AlteryxBasePluginsEngine.dll" EngineDllEntryPoint="AlteryxDbFileInput" />
            </Node>
          </ChildNodes>
        </Node>
        <Node ToolID="18">
      </ChildNodes>
    </Node>
    <Node ToolID="13">
    <Node ToolID="20">
  </ChildNodes>
</Node>
</Nodes>

ToolID 的预期输出如下所示: {10:-}, {19:11, 16, 13, 20}, {16:17, 18}, {17:2}, {2:-}, {11:-}, { 18:-}, {13:-}, {20:-}

我找到方法了。

  1. 查找所有 ToolID
tool_id_list = []
for elm in root.findall(".//Node"):
    try:
        tool_id_list.append(elm.attrib["ToolID"])
    except:
        pass
  1. 遍历所有可用 ToolID 的列表并搜索子节点:

node_hierarchy = dict((el,[]) for el in tool_id_list)

def find_child_tools(parent_id_list):
    for parent_id in parent_id_list:
        node = (".//Node[@ToolID='%s']/") % (parent_id)
        
        for elm in root.findall(node):
            for elm2 in elm.findall("./Node"):
                child_tool = elm2.attrib["ToolID"]
                node_hierarchy[parent_id].append(child_tool)


find_child_tools(tool_id_list)
node_hierarchy

输出:

{'10': [],
 '19': ['11', '16', '13', '20'],
 '11': ['8'],
 '8': [],
 '16': ['17', '18'],
 '17': ['2'],
 '2': [],
 '18': ['4'],
 '4': [],
 '13': ['12', '14', '15'],
 '12': [],
 '14': [],
 '15': [],
 '20': []}

如果您可以从 ElementTree 更改为 lxml(以获得更好的 XPath 支持),您似乎可以将其简化为如下所示...

Python

from lxml import etree

tree = etree.parse("input.xml")

nodes = {}

for node in tree.xpath("descendant-or-self::Node"):  # .//Node was not getting the first Node if it was the root element.
    nodes[node.get("ToolID")] = [child.get("ToolID") for child in node.xpath("./ChildNodes/Node")]

print(nodes)

输入XML(我试着让你的样本XML格式正确。希望结构仍然正确。)

<Node ToolID="19">
    <GuiSettings Plugin="AlteryxGuiToolkit.ToolContainer.ToolContainer"/>
    <Properties/>
    <ChildNodes>
        <Node ToolID="11"/>
        <Node ToolID="16">
            <GuiSettings Plugin="AlteryxGuiToolkit.ToolContainer.ToolContainer"/>
            <Properties/>
            <ChildNodes>
                <Node ToolID="17">
                    <GuiSettings Plugin="AlteryxGuiToolkit.ToolContainer.ToolContainer"/>
                    <Properties/>
                    <ChildNodes>
                        <Node ToolID="2">
                            <GuiSettings Plugin="AlteryxBasePluginsGui.DbFileInput.DbFileInput"/>
                            <Properties/>
                            <EngineSettings EngineDll="AlteryxBasePluginsEngine.dll" EngineDllEntryPoint="AlteryxDbFileInput"/>
                        </Node>
                    </ChildNodes>
                </Node>
                <Node ToolID="18"/>
            </ChildNodes>
        </Node>
        <Node ToolID="13"/>
        <Node ToolID="20"/>
    </ChildNodes>
</Node>

打印输出

{'19': ['11', '16', '13', '20'], 
 '11': [], 
 '16': ['17', '18'], 
 '17': ['2'], 
 '2':  [], 
 '18': [], 
 '13': [], 
 '20': []}

如果您的实际 XML 有一个不是 Node 的根元素,您仍然可以使用 ElementTree...

for node in tree.findall(".//Node"):
    nodes[node.get("ToolID")] = [child.get("ToolID") for child in node.findall("./ChildNodes/Node")]