通过指定存在多个 child 的名称来解析 XML
Parsing XML by specifying name of child where multiple exist
我在将类似的 SO 线程外推到更大的 XML 时遇到了一些问题,其中有多个 children 具有不同的名称。例如,这是我正在处理的文件的一个子集:
<?xml version="1.0" encoding="UTF-8"?>
<SDDXML xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema">
<RulesetFilename file="T24N_2022.bin"/>
<Model Name="Proposed">
...
<Model Name="Standard">
<Proj>
<Name>project0001</Name>
<DevMode>1</DevMode>
<BldgEngyModelVersion>16</BldgEngyModelVersion>
<AnalysisVersion>220070</AnalysisVersion>
<CreateDate>1650049043</CreateDate>
<EnergyUse>
..
<EnergyUse>
<Name>Efficiency Compliance</Name>
<EnduseName>Efficiency Compliance</EnduseName>
<ProposedTDV index="0">270.095</ProposedTDV>
<StandardTDV index="0">99.089</StandardTDV>
...
我正在尝试 'ProposedTDV' = 270.095 的值。我试过 BeautifulSoup 和 ElementTree,但我只是无法找到指定 child 名称的语法。即因为我不能使用像这样的搜索字符串:
Model/Proj/EnergyUse/ProposedTDV
我正在寻找更像的东西:
Model[Name="Standard"]/Proj/EnergyUse[Name='Efficiency Compliance']/ProposedTDV
或我可以与 BeauftifulSoup(或任何其他 XML 解析器)一起使用的类似物。
例如,我试过
from bs4 import BeautifulSoup
result = open(--xml_file_path--,'r')
contents = result.read()
soup = BeautifulSoup(contents,'xml')
test = soup.Model[Name="Proposed"].Proj.EnergyUse[Name='Efficiency Compliance'].findAll("ProposedTDV")
但我知道那里的语法是错误的。
看看[Python.Docs]: xml.etree.ElementTree - Supported XPath syntax。
我保存了你的 XML 并稍微增强了它(修复了错误并添加了一些虚拟节点),以便有一个工作示例。
blob00.xml:
<?xml version="1.0" encoding="UTF-8"?>
<SDDXML xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema">
<RulesetFilename file="T24N_2022.bin"/>
<Model Name="Proposed">
<Proj>
<!-- Other nodes -->
<EnergyUse>
<Name>Efficiency Compliance</Name>
<EnduseName>Efficiency Compliance</EnduseName>
<ProposedTDV index="0">1.618</ProposedTDV>
<StandardTDV index="0">9.809</StandardTDV>
</EnergyUse>
</Proj>
</Model>
<!-- Other nodes -->
<Model Name="Standard">
<Proj>
<Name>project0001</Name>
<DevMode>1</DevMode>
<BldgEngyModelVersion>16</BldgEngyModelVersion>
<AnalysisVersion>220070</AnalysisVersion>
<CreateDate>1650049043</CreateDate>
<EnergyUse/>
<!-- Other nodes -->
<EnergyUse>
<!-- Only this one should be selected! -->>
<Name>Efficiency Compliance</Name>
<EnduseName>Efficiency Compliance</EnduseName>
<ProposedTDV index="0">270.095</ProposedTDV>
<StandardTDV index="0">99.089</StandardTDV>
</EnergyUse>
<EnergyUse>
<Name>Some name that SHOULD NOT MATCH</Name>
<EnduseName>Efficiency Compliance</EnduseName>
<ProposedTDV index="0">3.141593</ProposedTDV>
<StandardTDV index="0">2.718282</StandardTDV>
</EnergyUse>
</Proj>
</Model>
</SDDXML>
code00.py:
#!/usr/bin/env python
from xml.etree import ElementTree as ET
import sys
def main(*argv):
doc = ET.parse("./blob00.xml")
root = doc.getroot()
search_xpath = "./Model[@Name='Standard']/Proj/EnergyUse[Name='Efficiency Compliance']/ProposedTDV"
# Below are different (less restrictive) filters. Decomment each and see the differences
#search_xpath = "./Model/Proj/EnergyUse[Name='Efficiency Compliance']/ProposedTDV"
#search_xpath = "./Model[@Name='Standard']/Proj/EnergyUse/ProposedTDV"
#search_xpath = "./Model/Proj/EnergyUse/ProposedTDV"
for proposedtdv_node in root.iterfind(search_xpath):
print("{:}\nText: {:s}".format(proposedtdv_node, proposedtdv_node.text))
if __name__ == "__main__":
print("Python {:s} {:03d}bit on {:s}\n".format(" ".join(elem.strip() for elem in sys.version.split("\n")),
64 if sys.maxsize > 0x100000000 else 32, sys.platform))
rc = main(*sys.argv[1:])
print("\nDone.")
sys.exit(rc)
输出:
[cfati@CFATI-5510-0:e:\Work\Dev\Whosebug\q071929246]> "e:\Work\Dev\VEnvs\py_pc064_03.09_test0\Scripts\python.exe" code00.py
Python 3.9.9 (tags/v3.9.9:ccb0e6a, Nov 15 2021, 18:08:50) [MSC v.1929 64 bit (AMD64)] 064bit on win32
<Element 'ProposedTDV' at 0x00000188CBBEE900>
Text: 270.095
Done.
我在将类似的 SO 线程外推到更大的 XML 时遇到了一些问题,其中有多个 children 具有不同的名称。例如,这是我正在处理的文件的一个子集:
<?xml version="1.0" encoding="UTF-8"?>
<SDDXML xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema">
<RulesetFilename file="T24N_2022.bin"/>
<Model Name="Proposed">
...
<Model Name="Standard">
<Proj>
<Name>project0001</Name>
<DevMode>1</DevMode>
<BldgEngyModelVersion>16</BldgEngyModelVersion>
<AnalysisVersion>220070</AnalysisVersion>
<CreateDate>1650049043</CreateDate>
<EnergyUse>
..
<EnergyUse>
<Name>Efficiency Compliance</Name>
<EnduseName>Efficiency Compliance</EnduseName>
<ProposedTDV index="0">270.095</ProposedTDV>
<StandardTDV index="0">99.089</StandardTDV>
...
我正在尝试 'ProposedTDV' = 270.095 的值。我试过 BeautifulSoup 和 ElementTree,但我只是无法找到指定 child 名称的语法。即因为我不能使用像这样的搜索字符串:
Model/Proj/EnergyUse/ProposedTDV
我正在寻找更像的东西:
Model[Name="Standard"]/Proj/EnergyUse[Name='Efficiency Compliance']/ProposedTDV
或我可以与 BeauftifulSoup(或任何其他 XML 解析器)一起使用的类似物。
例如,我试过
from bs4 import BeautifulSoup
result = open(--xml_file_path--,'r')
contents = result.read()
soup = BeautifulSoup(contents,'xml')
test = soup.Model[Name="Proposed"].Proj.EnergyUse[Name='Efficiency Compliance'].findAll("ProposedTDV")
但我知道那里的语法是错误的。
看看[Python.Docs]: xml.etree.ElementTree - Supported XPath syntax。
我保存了你的 XML 并稍微增强了它(修复了错误并添加了一些虚拟节点),以便有一个工作示例。
blob00.xml:
<?xml version="1.0" encoding="UTF-8"?>
<SDDXML xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema">
<RulesetFilename file="T24N_2022.bin"/>
<Model Name="Proposed">
<Proj>
<!-- Other nodes -->
<EnergyUse>
<Name>Efficiency Compliance</Name>
<EnduseName>Efficiency Compliance</EnduseName>
<ProposedTDV index="0">1.618</ProposedTDV>
<StandardTDV index="0">9.809</StandardTDV>
</EnergyUse>
</Proj>
</Model>
<!-- Other nodes -->
<Model Name="Standard">
<Proj>
<Name>project0001</Name>
<DevMode>1</DevMode>
<BldgEngyModelVersion>16</BldgEngyModelVersion>
<AnalysisVersion>220070</AnalysisVersion>
<CreateDate>1650049043</CreateDate>
<EnergyUse/>
<!-- Other nodes -->
<EnergyUse>
<!-- Only this one should be selected! -->>
<Name>Efficiency Compliance</Name>
<EnduseName>Efficiency Compliance</EnduseName>
<ProposedTDV index="0">270.095</ProposedTDV>
<StandardTDV index="0">99.089</StandardTDV>
</EnergyUse>
<EnergyUse>
<Name>Some name that SHOULD NOT MATCH</Name>
<EnduseName>Efficiency Compliance</EnduseName>
<ProposedTDV index="0">3.141593</ProposedTDV>
<StandardTDV index="0">2.718282</StandardTDV>
</EnergyUse>
</Proj>
</Model>
</SDDXML>
code00.py:
#!/usr/bin/env python
from xml.etree import ElementTree as ET
import sys
def main(*argv):
doc = ET.parse("./blob00.xml")
root = doc.getroot()
search_xpath = "./Model[@Name='Standard']/Proj/EnergyUse[Name='Efficiency Compliance']/ProposedTDV"
# Below are different (less restrictive) filters. Decomment each and see the differences
#search_xpath = "./Model/Proj/EnergyUse[Name='Efficiency Compliance']/ProposedTDV"
#search_xpath = "./Model[@Name='Standard']/Proj/EnergyUse/ProposedTDV"
#search_xpath = "./Model/Proj/EnergyUse/ProposedTDV"
for proposedtdv_node in root.iterfind(search_xpath):
print("{:}\nText: {:s}".format(proposedtdv_node, proposedtdv_node.text))
if __name__ == "__main__":
print("Python {:s} {:03d}bit on {:s}\n".format(" ".join(elem.strip() for elem in sys.version.split("\n")),
64 if sys.maxsize > 0x100000000 else 32, sys.platform))
rc = main(*sys.argv[1:])
print("\nDone.")
sys.exit(rc)
输出:
[cfati@CFATI-5510-0:e:\Work\Dev\Whosebug\q071929246]> "e:\Work\Dev\VEnvs\py_pc064_03.09_test0\Scripts\python.exe" code00.py Python 3.9.9 (tags/v3.9.9:ccb0e6a, Nov 15 2021, 18:08:50) [MSC v.1929 64 bit (AMD64)] 064bit on win32 <Element 'ProposedTDV' at 0x00000188CBBEE900> Text: 270.095 Done.