使用 Python lxml 解析 ONIX xml
Parsing an ONIX xml with conditions using Python lxml
我正在尝试使用 Python lxml
解析器从 ONIX XML format 文件中提取一些信息。
除其他外,文档中我感兴趣的部分如下所示:
<?xml version="1.0" encoding="UTF-8"?>
<ProductSupply>
<SupplyDetail>
<Supplier>
<SupplierRole>03</SupplierRole>
<SupplierName>EGEN</SupplierName>
</Supplier>
<ProductAvailability>40</ProductAvailability>
<Price>
<PriceType>01</PriceType>
<PriceAmount>0.00</PriceAmount>
<Tax>
<TaxType>01</TaxType>
<TaxRateCode>Z</TaxRateCode>
<TaxRatePercent>0</TaxRatePercent>
<TaxableAmount>0.00</TaxableAmount>
<TaxAmount>0.00</TaxAmount>
</Tax>
<CurrencyCode>NOK</CurrencyCode>
</Price>
<Price>
<PriceType>02</PriceType>
<PriceQualifier>05</PriceQualifier>
<PriceAmount>0.00</PriceAmount>
<Tax>
<TaxType>01</TaxType>
<TaxRateCode>Z</TaxRateCode>
<TaxRatePercent>0</TaxRatePercent>
<TaxableAmount>0.00</TaxableAmount>
<TaxAmount>0.00</TaxAmount>
</Tax>
<CurrencyCode>NOK</CurrencyCode>
</Price>
</SupplyDetail>
</ProductSupply>
我需要提货价格金额满足以下条件:
PriceType='02' and CurrencyCode='NOK' and PriceQualifier='05'
我试过了:
price = p.find(
"ProductSupply/SupplyDetail[Supplier/SupplierRole='03']/Price[PriceType='02' \
and CurrencyCode='NOK' and PriceQualifier='05']/PriceAmount").text
出于某种原因,我使用 and
运算符的 XPath 无法正常工作并出现以下错误:
File "<string>", line unknown
SyntaxError: invalid predicate
知道如何处理吗?
非常感谢任何帮助!
TL;DR: 使用 xpath()
因为 find*()
方法不支持像 and
这样的布尔运算符。
作为 ,您应该为您的(相当复杂的)XPath 表达式使用 lxml 的解析器方法 xpath()
。
XPath
您的 XPath 表达式包含 node tests 和 predicates which use the boolean operator and
(XPath 1.0):
ProductSupply/SupplyDetail[Supplier/SupplierRole='03']/Price[PriceType='02' \
and CurrencyCode='NOK' and PriceQualifier='05']/PriceAmount
提示:在线测试(参见Xpather demo)。这断言它按预期找到了一个元素 <PriceAmount>0.00</PriceAmount>
。
使用find()
方法
根据 Python 文档,您可以使用以下 find 方法,这些方法接受匹配表达式(例如 XPath)作为参数:
问题:对 find()
的 XPath 语法支持有限
虽然他们supported XPath syntax是有限的!
此 限制包括像您的 and
一样的逻辑运算符 。 Karl Thornton 在他的页面 XML parsing: Python ~ XPath ~ logical AND | Shiori.
上解释了这一点
另一边 note on lxml documentation 更喜欢他们:
The .find*()
methods are usually faster than the full-blown XPath support. They also support incremental tree processing through the .iterfind() method, whereas XPath always collects all results before returning them. They are therefore recommended over XPath for both speed and memory reasons, whenever there is no need for highly selective XPath queries.
(强调我的)
使用 lxml 的 xpath()
所以让我们从更安全、更丰富的 xpath()
功能开始(在过早优化之前)。例如:
# the node predicates to apply within XPath
sd_predicate = "[Supplier/SupplierRole='03']"
p_predicate = "[PriceType='02' and CurrencyCode='NOK' and PriceQualifier='05']"
pa_xpath = f"ProductSupply/SupplyDetail{sd_predicate}/Price{p_predicate}/PriceAmount" # building XPath including predicates with f-string
print("Using XPath:", pa_xpath) # remove after debugging
root = tree.getroot()
price_amount = root.xpath(pa_xpath)
print("XPath evaluated to:", price_amount) # remove after debugging
另请参阅:
- 官方 lxml 指南:XPath and XSLT with lxml
我正在尝试使用 Python lxml
解析器从 ONIX XML format 文件中提取一些信息。
除其他外,文档中我感兴趣的部分如下所示:
<?xml version="1.0" encoding="UTF-8"?>
<ProductSupply>
<SupplyDetail>
<Supplier>
<SupplierRole>03</SupplierRole>
<SupplierName>EGEN</SupplierName>
</Supplier>
<ProductAvailability>40</ProductAvailability>
<Price>
<PriceType>01</PriceType>
<PriceAmount>0.00</PriceAmount>
<Tax>
<TaxType>01</TaxType>
<TaxRateCode>Z</TaxRateCode>
<TaxRatePercent>0</TaxRatePercent>
<TaxableAmount>0.00</TaxableAmount>
<TaxAmount>0.00</TaxAmount>
</Tax>
<CurrencyCode>NOK</CurrencyCode>
</Price>
<Price>
<PriceType>02</PriceType>
<PriceQualifier>05</PriceQualifier>
<PriceAmount>0.00</PriceAmount>
<Tax>
<TaxType>01</TaxType>
<TaxRateCode>Z</TaxRateCode>
<TaxRatePercent>0</TaxRatePercent>
<TaxableAmount>0.00</TaxableAmount>
<TaxAmount>0.00</TaxAmount>
</Tax>
<CurrencyCode>NOK</CurrencyCode>
</Price>
</SupplyDetail>
</ProductSupply>
我需要提货价格金额满足以下条件:
PriceType='02' and CurrencyCode='NOK' and PriceQualifier='05'
我试过了:
price = p.find(
"ProductSupply/SupplyDetail[Supplier/SupplierRole='03']/Price[PriceType='02' \
and CurrencyCode='NOK' and PriceQualifier='05']/PriceAmount").text
出于某种原因,我使用 and
运算符的 XPath 无法正常工作并出现以下错误:
File "<string>", line unknown
SyntaxError: invalid predicate
知道如何处理吗? 非常感谢任何帮助!
TL;DR: 使用 xpath()
因为 find*()
方法不支持像 and
这样的布尔运算符。
作为 xpath()
。
XPath
您的 XPath 表达式包含 node tests 和 predicates which use the boolean operator and
(XPath 1.0):
ProductSupply/SupplyDetail[Supplier/SupplierRole='03']/Price[PriceType='02' \
and CurrencyCode='NOK' and PriceQualifier='05']/PriceAmount
提示:在线测试(参见Xpather demo)。这断言它按预期找到了一个元素 <PriceAmount>0.00</PriceAmount>
。
使用find()
方法
根据 Python 文档,您可以使用以下 find 方法,这些方法接受匹配表达式(例如 XPath)作为参数:
问题:对 find()
的 XPath 语法支持有限
虽然他们supported XPath syntax是有限的!
此 限制包括像您的 and
一样的逻辑运算符 。 Karl Thornton 在他的页面 XML parsing: Python ~ XPath ~ logical AND | Shiori.
另一边 note on lxml documentation 更喜欢他们:
The
.find*()
methods are usually faster than the full-blown XPath support. They also support incremental tree processing through the .iterfind() method, whereas XPath always collects all results before returning them. They are therefore recommended over XPath for both speed and memory reasons, whenever there is no need for highly selective XPath queries.
(强调我的)
使用 lxml 的 xpath()
所以让我们从更安全、更丰富的 xpath()
功能开始(在过早优化之前)。例如:
# the node predicates to apply within XPath
sd_predicate = "[Supplier/SupplierRole='03']"
p_predicate = "[PriceType='02' and CurrencyCode='NOK' and PriceQualifier='05']"
pa_xpath = f"ProductSupply/SupplyDetail{sd_predicate}/Price{p_predicate}/PriceAmount" # building XPath including predicates with f-string
print("Using XPath:", pa_xpath) # remove after debugging
root = tree.getroot()
price_amount = root.xpath(pa_xpath)
print("XPath evaluated to:", price_amount) # remove after debugging
另请参阅:
- 官方 lxml 指南:XPath and XSLT with lxml