使用 Python lxml 解析 ONIX xml

Parsing an ONIX xml with conditions using Python lxml

我正在尝试使用 Python lxml 解析器从 ONIX XML format 文件中提取一些信息。

除其他外,文档中我感兴趣的部分如下所示:

<?xml version="1.0" encoding="UTF-8"?>
<ProductSupply>
       <SupplyDetail>
          <Supplier>
             <SupplierRole>03</SupplierRole>
             <SupplierName>EGEN</SupplierName>
          </Supplier>
          <ProductAvailability>40</ProductAvailability>
          <Price>
             <PriceType>01</PriceType>
             <PriceAmount>0.00</PriceAmount>
             <Tax>
                <TaxType>01</TaxType>
                <TaxRateCode>Z</TaxRateCode>
                <TaxRatePercent>0</TaxRatePercent>
                <TaxableAmount>0.00</TaxableAmount>
                <TaxAmount>0.00</TaxAmount>
             </Tax>
             <CurrencyCode>NOK</CurrencyCode>
          </Price>
          <Price>
             <PriceType>02</PriceType>
             <PriceQualifier>05</PriceQualifier>
             <PriceAmount>0.00</PriceAmount>
             <Tax>
                <TaxType>01</TaxType>
                <TaxRateCode>Z</TaxRateCode>
                <TaxRatePercent>0</TaxRatePercent>
                <TaxableAmount>0.00</TaxableAmount>
                <TaxAmount>0.00</TaxAmount>
             </Tax>
             <CurrencyCode>NOK</CurrencyCode>
          </Price>
       </SupplyDetail>
    </ProductSupply>

我需要提货价格金额满足以下条件:

PriceType='02' and CurrencyCode='NOK' and PriceQualifier='05'

我试过了:

price = p.find(
"ProductSupply/SupplyDetail[Supplier/SupplierRole='03']/Price[PriceType='02' \
and CurrencyCode='NOK' and PriceQualifier='05']/PriceAmount").text

出于某种原因,我使用 and 运算符的 XPath 无法正常工作并出现以下错误:

File "<string>", line unknown
    SyntaxError: invalid predicate

知道如何处理吗? 非常感谢任何帮助!

TL;DR: 使用 xpath() 因为 find*() 方法不支持像 and 这样的布尔运算符。


作为 ,您应该为您的(相当复杂的)XPath 表达式使用 lxml 的解析器方法 xpath()

XPath

您的 XPath 表达式包含 node testspredicates which use the boolean operator and (XPath 1.0):

ProductSupply/SupplyDetail[Supplier/SupplierRole='03']/Price[PriceType='02' \
and CurrencyCode='NOK' and PriceQualifier='05']/PriceAmount

提示:在线测试(参见Xpather demo)。这断言它按预期找到了一个元素 <PriceAmount>0.00</PriceAmount>

使用find()方法

根据 Python 文档,您可以使用以下 find 方法,这些方法接受匹配表达式(例如 XPath)作为参数:

  1. find
  2. findAll

问题:对 find()

的 XPath 语法支持有限

虽然他们supported XPath syntax是有限的!

限制包括像您的 and 一样的逻辑运算符 。 Karl Thornton 在他的页面 XML parsing: Python ~ XPath ~ logical AND | Shiori.

上解释了这一点

另一边 note on lxml documentation 更喜欢他们:

The .find*() methods are usually faster than the full-blown XPath support. They also support incremental tree processing through the .iterfind() method, whereas XPath always collects all results before returning them. They are therefore recommended over XPath for both speed and memory reasons, whenever there is no need for highly selective XPath queries.

(强调我的)

使用 lxml 的 xpath()

所以让我们从更安全、更丰富的 xpath() 功能开始(在过早优化之前)。例如:

# the node predicates to apply within XPath
sd_predicate = "[Supplier/SupplierRole='03']"
p_predicate = "[PriceType='02' and CurrencyCode='NOK' and PriceQualifier='05']"

pa_xpath = f"ProductSupply/SupplyDetail{sd_predicate}/Price{p_predicate}/PriceAmount"  # building XPath including predicates with f-string
print("Using XPath:", pa_xpath) # remove after debugging

root = tree.getroot()
price_amount = root.xpath(pa_xpath)
print("XPath evaluated to:", price_amount) # remove after debugging

另请参阅: