Python - 使用元素树从 xml 中的特定节点获取数据
Python - using element tree to get data from specific nodes in xml
我一直在四处寻找,有很多类似的问题,但是none很遗憾地解决了我的问题。
我的 XML 文件看起来像这样
<?xml version="1.0" encoding="utf-8"?>
<Nodes>
<Node ComponentID="1">
<Settings>
<Value name="Text Box (1)"> SettingA </Value>
<Value name="Text Box (2)"> SettingB </Value>
<Value name="Text Box (3)"> SettingC </Value>
<Value name="Text Box (4)"> SettingD </Value>
<AdvSettings State="On"/>
</Settings>
</Node>
<Node ComponentID="2">
<Settings>
<Value name="Text Box (1)"> SettingA </Value>
<Value name="Text Box (2)"> SettingB </Value>
<Value name="Text Box (3)"> SettingC </Value>
<Value name="Text Box (4)"> SettingD </Value>
<AdvSettings State="Off"/>
</Settings>
</Node>
<Node ComponentID="3">
<Settings>
<Value name="Text Box (1)"> SettingG </Value>
<Value name="Text Box (2)"> SettingH </Value>
<Value name="Text Box (3)"> SettingI </Value>
<Value name="Text Box (4)"> SettingJ </Value>
<AdvSettings State="Yes"/>
</Settings>
</Node>
</Nodes>
使用 Python 我正在尝试为每个将“AdvSettings”设置为开启的节点获取文本框 1 和文本框 2 的值。
所以在这种情况下,我想要这样的结果
ComponentID State Textbox1 Textbox2
1 On SettingA SettingB
3 On SettingG SettingH
我做了一些尝试,但没有取得太大进展。有了这个,我设法获得了 AdvSettings 标签,但就我所知:
import xml.etree.ElementTree as ET
tree = ET.parse('XMLSearch.xml')
root = tree.getroot()
for AdvSettingsin root.iter('AdvSettings'):
print(AdvSettings.tag, AdvSettings.attrib)
您可以使用 XPath 找到所有相关节点,然后从中提取所需的数据。一个例子如下所示。 (评论作为解释)
from lxml import etree
xml = etree.fromstring('''
<Nodes>...
</Nodes>
''')
# Use XPath to select the relevant nodes
on_nodes = xml.xpath("//Node[Settings[AdvSettings[@State='Yes' or @State='On']]]")
# Get all needed information from every node
data_collected = [dict(
[("ComponentID", node.attrib['ComponentID'])] +
[(c.get("name"), c.text) for c in node.find("Settings").getchildren() if c.text]) for node in on_nodes]
# You got a list of dicts with all relevant information
# print it out, I used pandas for formatting. Optional
import pandas
print(pandas.DataFrame.from_records(data_collected).to_markdown(index=False))
会给你这样的输出
| ComponentID | Text Box (1) | Text Box (2) | Text Box (3) | Text Box (4) |
|--------------:|:---------------|:---------------|:---------------|:---------------|
| 1 | SettingA | SettingB | SettingC | SettingD |
| 3 | SettingG | SettingH | SettingI | SettingJ |
下面(使用python core xml lib)
import xml.etree.ElementTree as ET
import pandas as pd
xml = '''<?xml version="1.0" encoding="utf-8"?>
<Nodes>
<Node ComponentID="1">
<Settings>
<Value name="Text Box (1)"> SettingA </Value>
<Value name="Text Box (2)"> SettingB </Value>
<Value name="Text Box (3)"> SettingC </Value>
<Value name="Text Box (4)"> SettingD </Value>
<AdvSettings State="On"/>
</Settings>
</Node>
<Node ComponentID="2">
<Settings>
<Value name="Text Box (1)"> SettingA </Value>
<Value name="Text Box (2)"> SettingB </Value>
<Value name="Text Box (3)"> SettingC </Value>
<Value name="Text Box (4)"> SettingD </Value>
<AdvSettings State="Off"/>
</Settings>
</Node>
<Node ComponentID="3">
<Settings>
<Value name="Text Box (1)"> SettingG </Value>
<Value name="Text Box (2)"> SettingH </Value>
<Value name="Text Box (3)"> SettingI </Value>
<Value name="Text Box (4)"> SettingJ </Value>
<AdvSettings State="Yes"/>
</Settings>
</Node>
</Nodes>'''
data = []
root = ET.fromstring(xml)
nodes = root.findall('.//Node')
for node in nodes:
adv = node.find('.//AdvSettings')
if adv is None:
continue
flag = adv.attrib.get('State','Off')
if flag == 'On' or flag == 'Yes':
data.append({'id':node.attrib.get('ComponentID'),'txt_box_1':node.find('.//Value[@name="Text Box (1)"]').text.strip(),'txt_box_2':node.find('.//Value[@name="Text Box (2)"]').text.strip()})
df = pd.DataFrame(data)
print(df)
输出
id txt_box_1 txt_box_2
0 1 SettingA SettingB
1 3 SettingG SettingH
我一直在四处寻找,有很多类似的问题,但是none很遗憾地解决了我的问题。
我的 XML 文件看起来像这样
<?xml version="1.0" encoding="utf-8"?>
<Nodes>
<Node ComponentID="1">
<Settings>
<Value name="Text Box (1)"> SettingA </Value>
<Value name="Text Box (2)"> SettingB </Value>
<Value name="Text Box (3)"> SettingC </Value>
<Value name="Text Box (4)"> SettingD </Value>
<AdvSettings State="On"/>
</Settings>
</Node>
<Node ComponentID="2">
<Settings>
<Value name="Text Box (1)"> SettingA </Value>
<Value name="Text Box (2)"> SettingB </Value>
<Value name="Text Box (3)"> SettingC </Value>
<Value name="Text Box (4)"> SettingD </Value>
<AdvSettings State="Off"/>
</Settings>
</Node>
<Node ComponentID="3">
<Settings>
<Value name="Text Box (1)"> SettingG </Value>
<Value name="Text Box (2)"> SettingH </Value>
<Value name="Text Box (3)"> SettingI </Value>
<Value name="Text Box (4)"> SettingJ </Value>
<AdvSettings State="Yes"/>
</Settings>
</Node>
</Nodes>
使用 Python 我正在尝试为每个将“AdvSettings”设置为开启的节点获取文本框 1 和文本框 2 的值。
所以在这种情况下,我想要这样的结果
ComponentID State Textbox1 Textbox2
1 On SettingA SettingB
3 On SettingG SettingH
我做了一些尝试,但没有取得太大进展。有了这个,我设法获得了 AdvSettings 标签,但就我所知:
import xml.etree.ElementTree as ET
tree = ET.parse('XMLSearch.xml')
root = tree.getroot()
for AdvSettingsin root.iter('AdvSettings'):
print(AdvSettings.tag, AdvSettings.attrib)
您可以使用 XPath 找到所有相关节点,然后从中提取所需的数据。一个例子如下所示。 (评论作为解释)
from lxml import etree
xml = etree.fromstring('''
<Nodes>...
</Nodes>
''')
# Use XPath to select the relevant nodes
on_nodes = xml.xpath("//Node[Settings[AdvSettings[@State='Yes' or @State='On']]]")
# Get all needed information from every node
data_collected = [dict(
[("ComponentID", node.attrib['ComponentID'])] +
[(c.get("name"), c.text) for c in node.find("Settings").getchildren() if c.text]) for node in on_nodes]
# You got a list of dicts with all relevant information
# print it out, I used pandas for formatting. Optional
import pandas
print(pandas.DataFrame.from_records(data_collected).to_markdown(index=False))
会给你这样的输出
| ComponentID | Text Box (1) | Text Box (2) | Text Box (3) | Text Box (4) |
|--------------:|:---------------|:---------------|:---------------|:---------------|
| 1 | SettingA | SettingB | SettingC | SettingD |
| 3 | SettingG | SettingH | SettingI | SettingJ |
下面(使用python core xml lib)
import xml.etree.ElementTree as ET
import pandas as pd
xml = '''<?xml version="1.0" encoding="utf-8"?>
<Nodes>
<Node ComponentID="1">
<Settings>
<Value name="Text Box (1)"> SettingA </Value>
<Value name="Text Box (2)"> SettingB </Value>
<Value name="Text Box (3)"> SettingC </Value>
<Value name="Text Box (4)"> SettingD </Value>
<AdvSettings State="On"/>
</Settings>
</Node>
<Node ComponentID="2">
<Settings>
<Value name="Text Box (1)"> SettingA </Value>
<Value name="Text Box (2)"> SettingB </Value>
<Value name="Text Box (3)"> SettingC </Value>
<Value name="Text Box (4)"> SettingD </Value>
<AdvSettings State="Off"/>
</Settings>
</Node>
<Node ComponentID="3">
<Settings>
<Value name="Text Box (1)"> SettingG </Value>
<Value name="Text Box (2)"> SettingH </Value>
<Value name="Text Box (3)"> SettingI </Value>
<Value name="Text Box (4)"> SettingJ </Value>
<AdvSettings State="Yes"/>
</Settings>
</Node>
</Nodes>'''
data = []
root = ET.fromstring(xml)
nodes = root.findall('.//Node')
for node in nodes:
adv = node.find('.//AdvSettings')
if adv is None:
continue
flag = adv.attrib.get('State','Off')
if flag == 'On' or flag == 'Yes':
data.append({'id':node.attrib.get('ComponentID'),'txt_box_1':node.find('.//Value[@name="Text Box (1)"]').text.strip(),'txt_box_2':node.find('.//Value[@name="Text Box (2)"]').text.strip()})
df = pd.DataFrame(data)
print(df)
输出
id txt_box_1 txt_box_2
0 1 SettingA SettingB
1 3 SettingG SettingH