XML 使用具有多个键值的相同标签来字典 Python

XML to dict Python with same tags that have multiple keys values

针对 XML Dict 提出了很多解决方案,但我无法解决我的特定用例。

我的XML格式是有多个相同的标签,但每个标签内可能有很多键值,而且并非所有标签都有一致数量的键值。这使它具有挑战性。

例如

<?xml version="1.0" encoding="UTF-8"?>
<mxfile host="xxx.xxx.com" modified="2021-06-14T07:52:04.437Z" agent="xxx" version="12.4.8" etag="o-cccc" type="device">
  <diagram id="asdfsdf">
    <mxGraphModel dx="1213" dy="2767" grid="1" gridSize="10" guides="1" tooltips="1" connect="1" arrows="1" fold="1" page="1" pageScale="1" pageWidth="827" pageHeight="1169" math="0" shadow="0">
      <root>
        <mxCell id="0"/>
        <mxCell id="1" parent="0"/>
        <mxCell id="2" value="label_1" style="points=[[0,0],[0.25,0],[0.5,0],[0.75,0],[1,0],[1,0.25],[1,0.5],[1,0.75],[1,1],[0.75,1],[0.5,1],[0.25,1],[0,1],[0,0.75],[0,0.5],[0,0.25]];outlineConnect=0;gradientColor=none;html=1;whiteSpace=wrap;fontSize=17;fontStyle=0;shape=shape_1;grIcon=icon_1;strokeColor=#232F3E;fillColor=none;verticalAlign=top;align=left;spacingLeft=30;fontColor=#232F3E;dashed=0;" vertex="1" parent="1">
          <mxGeometry x="110" y="-50" width="1170" height="840" as="geometry"/>
        </mxCell>
        <mxCell id="3" value="Region" style="points=[[0,0],[0.25,0],[0.5,0],[0.75,0],[1,0],[1,0.25],[1,0.5],[1,0.75],[1,1],[0.75,1],[0.5,1],[0.25,1],[0,1],[0,0.75],[0,0.5],[0,0.25]];outlineConnect=0;gradientColor=none;html=1;whiteSpace=wrap;fontSize=17;fontStyle=0;shape=shape_1;grIcon=icon_2;strokeColor=#147EBA;fillColor=none;verticalAlign=top;align=left;spacingLeft=30;fontColor=#147EBA;dashed=0;" vertex="1" parent="1">
          <mxGeometry x="290" y="190" width="960" height="580" as="geometry"/>
        </mxCell>
        <mxCell id="4" value="Area 1" style="fillColor=none;strokeColor=#147EBA;dashed=1;verticalAlign=top;fontStyle=0;fontColor=#147EBA;fontSize=17;" vertex="1" parent="1">
          <mxGeometry x="750" y="340" width="320" height="420" as="geometry"/>
        </mxCell>
        <mxCell id="5" value="Area 1" style="fillColor=none;strokeColor=#147EBA;dashed=1;verticalAlign=top;fontStyle=0;fontColor=#147EBA;fontSize=17;" vertex="1" parent="1">
          <mxGeometry x="326" y="340" width="364" height="420" as="geometry"/>
        </mxCell>
        <mxCell id="6" value="" style="edgeStyle=orthogonalEdgeStyle;rounded=0;orthogonalLoop=1;jettySize=auto;html=1;fontSize=17;" edge="1" source="7" target="9" parent="1">
          <mxGeometry relative="1" as="geometry"/>
        </mxCell>
        <mxCell id="7" value="" style="outlineConnect=0;fontColor=#232F3E;gradientColor=none;fillColor=#232F3E;strokeColor=none;dashed=0;verticalLabelPosition=bottom;verticalAlign=top;align=center;html=1;fontSize=17;fontStyle=0;aspect=fixed;pointerEvents=1;shape=shape_3;" vertex="1" parent="1">
          <mxGeometry x="698.43" y="-110" width="34" height="34" as="geometry"/>
        </mxCell>
        <mxCell id="8" value="" style="edgeStyle=orthogonalEdgeStyle;rounded=0;orthogonalLoop=1;jettySize=auto;html=1;fontSize=17;" edge="1" source="9" target="35" parent="1">
          <mxGeometry relative="1" as="geometry"/>
        </mxCell>
        <mxCell id="9" value="" style="outlineConnect=0;fontColor=#232F3E;gradientColor=#945DF2;gradientDirection=north;fillColor=#5A30B5;strokeColor=#ffffff;dashed=0;verticalLabelPosition=bottom;verticalAlign=top;align=center;html=1;fontSize=17;fontStyle=0;aspect=fixed;shape=shape_1;resIcon=service_2;" vertex="1" parent="1">
          <mxGeometry x="675.43" y="-30" width="80" height="80" as="geometry"/>
        </mxCell>
        <mxCell id="24" value="&lt;font style=&quot;font-size: 15px&quot;&gt;Service name 1&lt;/font&gt;" style="outlineConnect=0;fontColor=#232F3E;gradientColor=none;strokeColor=#ffffff;fillColor=#232F3E;dashed=0;verticalLabelPosition=middle;verticalAlign=bottom;align=center;html=1;whiteSpace=wrap;fontSize=17;fontStyle=1;spacing=3;shape=shape_2;prIcon=service_2;" vertex="1" parent="1">
          <mxGeometry x="159" width="62" height="100" as="geometry"/>
        </mxCell>
        <mxCell id="25" value="" style="outlineConnect=0;fontColor=#232F3E;gradientColor=none;fillColor=#D86613;strokeColor=none;dashed=0;verticalLabelPosition=bottom;verticalAlign=top;align=center;html=1;fontSize=17;fontStyle=0;aspect=fixed;pointerEvents=1;shape=shape_3;" vertex="1" parent="1">
          <mxGeometry x="817.1399999999999" y="383" width="64" height="64" as="geometry"/>
        </mxCell>
        <mxCell id="26" value="" style="edgeStyle=orthogonalEdgeStyle;rounded=0;orthogonalLoop=1;jettySize=auto;html=1;fontSize=17;" edge="1" source="27" target="28" parent="1">
          <mxGeometry relative="1" as="geometry"/>
        </mxCell>
        <mxCell id="27" value="" style="outlineConnect=0;fontColor=#232F3E;gradientColor=#4D72F3;gradientDirection=north;fillColor=#3334B9;strokeColor=#ffffff;dashed=0;verticalLabelPosition=bottom;verticalAlign=top;align=center;html=1;fontSize=17;fontStyle=0;aspect=fixed;shape=shape_4;resIcon=service_3;" vertex="1" parent="1">
          <mxGeometry x="473" y="640" width="64" height="64" as="geometry"/>
        </mxCell>
        <mxCell id="28" value="" style="outlineConnect=0;fontColor=#232F3E;gradientColor=#4D72F3;gradientDirection=north;fillColor=#3334B9;strokeColor=#ffffff;dashed=0;verticalLabelPosition=bottom;verticalAlign=top;align=center;html=1;fontSize=17;fontStyle=0;aspect=fixed;shape=shape_4;resIcon=service_2;" vertex="1" parent="1">
          <mxGeometry x="885.2899999999998" y="639" width="64" height="64" as="geometry"/>
        </mxCell>
        <mxCell id="29" value="Primary&lt;br style=&quot;font-size: 17px;&quot;&gt;(Multi-area)" style="text;html=1;resizable=0;autosize=1;align=center;verticalAlign=middle;points=[];fillColor=none;strokeColor=none;rounded=0;fontSize=17;" vertex="1" parent="1">
          <mxGeometry x="458" y="701" width="90" height="50" as="geometry"/>
        </mxCell>
        <mxCell id="30" value="" style="outlineConnect=0;fontColor=#232F3E;gradientColor=none;fillColor=#D86613;strokeColor=none;dashed=0;verticalLabelPosition=bottom;verticalAlign=top;align=center;html=1;fontSize=17;fontStyle=0;aspect=fixed;pointerEvents=1;shape=shape_3;" vertex="1" parent="1">
          <mxGeometry x="385" y="380" width="68" height="68" as="geometry"/>
        </mxCell>
        <mxCell id="31" value="" style="edgeStyle=orthogonalEdgeStyle;rounded=0;orthogonalLoop=1;jettySize=auto;html=1;endArrow=classic;endFill=1;fontSize=17;" edge="1" source="32" target="12" parent="1">
          <mxGeometry relative="1" as="geometry">
            <Array as="points">
              <mxPoint x="503" y="550"/>
              <mxPoint x="1160" y="550"/>
            </Array>
          </mxGeometry>
        </mxCell>
        <mxCell id="32" value="&lt;font style=&quot;font-size: 17px&quot;&gt;web component&lt;br style=&quot;font-size: 17px&quot;&gt;(&lt;b&gt;Service name 1&lt;br&gt;WordPress Instance&lt;/b&gt;)&lt;/font&gt;" style="text;html=1;resizable=0;autosize=1;align=center;verticalAlign=middle;points=[];fillColor=none;strokeColor=none;rounded=0;fontSize=17;" vertex="1" parent="1">
          <mxGeometry x="413" y="452" width="180" height="70" as="geometry"/>
        </mxCell>
      </root>
    </mxGraphModel>
  </diagram>
</mxfile>

指标: 箭头 - “edgeStyle=orthogonalEdgeStyle” 对象(服务)-“resIcon=service_2”/service_1 等

到目前为止我做了什么 - 使用 xml.etree.ElementTree 我在循环中提取标签和属性,使用我想要的关键字提取这些值。'

结果存储在数组中。

id = []
attrb = []
objects_found = []
arrows_found = []

最后我想转换成dict对象-

{
   id: '1',
   attrb: 'service'
   object: 'service_a'
   arrows: true
   arrow_start: {coordinate}
   arrow_end: {coordinate}
}

如果没有箭头:

{
   id: '1',
   attrb: 'service'
   object: 'service_a'
}

我的代码:

for item in tree.iter():
    if item.tag == 'mxCell':
        id = item.attrib['id']
        # to split the long list of words with ';' in 'style' key. Major info is in there.        
            style_list = item.attrib['style'].split(';')
            for style in style_list: 
                if '=' in style:
                    style_key = style.split('=')[0]
                    style_value = style.split('=')[1]
                    if style_key == 'shape' and style_value != 'icon' and 'keyword-a' in style_value:
                        service_icon = style_value
                        id.append(id)
                        attrb.append("service_name")
                        objects_found.append(service_icon)
                    elif style_key == 'resIcon':
                        service_icon = style_value
                        id.append(id)
                        attrb.append("service_name")
                        objects_found.append(service_icon)
                    elif style_key == 'edgeStyle':
                        arrow_style = style_value
                        id.append(id)
                        attrb.append("arrows")
                        arrows_found.append(arrow_style)

我试过

  1. 字典(zip))。但挑战在于可能有一些可选键在某些 ID 中不存在。
  2. pandas 数据框(并不理想,因为我打算听写)但我尝试使用 csv 获取 table 形式,数组也无法正常工作,因为我得到的数组值已经丢失ids和key-values之间的关系通过将它们放入数组中来识别,并且不同长度的数组不能一起放入数据帧中。

对任何解决方案有什么好的建议吗?

终于找到简单易行的方法

根据当前的逻辑,我可以使用关键字提取数据,并且对于每个已识别的键值,我将追加到嵌套字典中。

dictObj = {} 并在 for 循环中,开始为每个 id 在其中启动嵌套字典 - dictObj[id] = {}

确定每个键后,继续 dictObj[id].update({'key': value})

不确定这是否是最有效的方法,但至少我得到了我想要的输出。如果有人有更好的方法,请分享。