Python - 使用元素树从 xml 中的特定节点获取数据

Python - using element tree to get data from specific nodes in xml

我一直在四处寻找,有很多类似的问题,但是none很遗憾地解决了我的问题。

我的 XML 文件看起来像这样

<?xml version="1.0" encoding="utf-8"?>
  <Nodes>
    <Node ComponentID="1">
      <Settings>
        <Value name="Text Box (1)"> SettingA </Value>
        <Value name="Text Box (2)"> SettingB </Value>
        <Value name="Text Box (3)"> SettingC </Value>
        <Value name="Text Box (4)"> SettingD </Value>
      <AdvSettings State="On"/>
      </Settings>
    </Node>
    <Node ComponentID="2">
      <Settings>
        <Value name="Text Box (1)"> SettingA </Value>
        <Value name="Text Box (2)"> SettingB </Value>
        <Value name="Text Box (3)"> SettingC </Value>
        <Value name="Text Box (4)"> SettingD </Value>
      <AdvSettings State="Off"/>
      </Settings>
    </Node>
    <Node ComponentID="3">
      <Settings>
        <Value name="Text Box (1)"> SettingG </Value>
        <Value name="Text Box (2)"> SettingH </Value>
        <Value name="Text Box (3)"> SettingI </Value>
        <Value name="Text Box (4)"> SettingJ </Value>
      <AdvSettings State="Yes"/>
      </Settings>
    </Node>
  </Nodes>

使用 Python 我正在尝试为每个将“AdvSettings”设置为开启的节点获取文本框 1 和文本框 2 的值。

所以在这种情况下,我想要这样的结果

ComponentID  State  Textbox1  Textbox2
1            On     SettingA  SettingB
3            On     SettingG  SettingH

我做了一些尝试,但没有取得太大进展。有了这个,我设法获得了 AdvSettings 标签,但就我所知:

import xml.etree.ElementTree as ET
tree = ET.parse('XMLSearch.xml')
root = tree.getroot()

for AdvSettingsin root.iter('AdvSettings'):
    print(AdvSettings.tag, AdvSettings.attrib)

您可以使用 XPath 找到所有相关节点,然后从中提取所需的数据。一个例子如下所示。 (评论作为解释)

from lxml import etree

xml = etree.fromstring('''
  <Nodes>...
  </Nodes>
''')

# Use XPath to select the relevant nodes

on_nodes = xml.xpath("//Node[Settings[AdvSettings[@State='Yes' or @State='On']]]")

# Get all needed information from every node
data_collected = [dict(
    [("ComponentID", node.attrib['ComponentID'])] +
    [(c.get("name"), c.text) for c in node.find("Settings").getchildren() if c.text]) for node in on_nodes]


# You got a list of dicts with all relevant information
# print it out, I used pandas for formatting. Optional
import pandas
print(pandas.DataFrame.from_records(data_collected).to_markdown(index=False))

会给你这样的输出

|   ComponentID | Text Box (1)   | Text Box (2)   | Text Box (3)   | Text Box (4)   |
|--------------:|:---------------|:---------------|:---------------|:---------------|
|             1 | SettingA       | SettingB       | SettingC       | SettingD       |
|             3 | SettingG       | SettingH       | SettingI       | SettingJ       |

下面(使用python core xml lib)

import xml.etree.ElementTree as ET
import pandas as pd

xml = '''<?xml version="1.0" encoding="utf-8"?>
  <Nodes>
    <Node ComponentID="1">
      <Settings>
        <Value name="Text Box (1)"> SettingA </Value>
        <Value name="Text Box (2)"> SettingB </Value>
        <Value name="Text Box (3)"> SettingC </Value>
        <Value name="Text Box (4)"> SettingD </Value>
      <AdvSettings State="On"/>
      </Settings>
    </Node>
    <Node ComponentID="2">
      <Settings>
        <Value name="Text Box (1)"> SettingA </Value>
        <Value name="Text Box (2)"> SettingB </Value>
        <Value name="Text Box (3)"> SettingC </Value>
        <Value name="Text Box (4)"> SettingD </Value>
      <AdvSettings State="Off"/>
      </Settings>
    </Node>
    <Node ComponentID="3">
      <Settings>
        <Value name="Text Box (1)"> SettingG </Value>
        <Value name="Text Box (2)"> SettingH </Value>
        <Value name="Text Box (3)"> SettingI </Value>
        <Value name="Text Box (4)"> SettingJ </Value>
      <AdvSettings State="Yes"/>
      </Settings>
    </Node>
  </Nodes>''' 

data = []
root = ET.fromstring(xml)
nodes = root.findall('.//Node')
for node in nodes:
  adv = node.find('.//AdvSettings')
  if adv is None:
    continue
  flag = adv.attrib.get('State','Off')
  if flag == 'On' or  flag == 'Yes':
    data.append({'id':node.attrib.get('ComponentID'),'txt_box_1':node.find('.//Value[@name="Text Box (1)"]').text.strip(),'txt_box_2':node.find('.//Value[@name="Text Box (2)"]').text.strip()})

df = pd.DataFrame(data)
print(df)

输出

  id txt_box_1 txt_box_2
0  1  SettingA  SettingB
1  3  SettingG  SettingH