如何使用 lxml 访问 xml 字段?

How to access xml field with lxml?

Python3.6,Lxml,Windows10

我快疯了。我想访问项目字段。但我总是得到错误:

AttributeError: 'cython_function_or_method' object has no attribute 'item'

我可以毫无问题地访问其他所有内容(地址字段等...)。我如何访问项目字段(sku、数量等...)?

我用过这个代码:

import requests
from lxml import objectify

url = "URL_TO_XML_FILE"
xml_content = requests.get(url).text.encode('utf-8')

xml = objectify.fromstring(xml_content)

for sale in xml.response.sales.sale:
    for item in sale.items.item:
        print(item.sku)

这里是xml的开头:

<?xml version="1.0" encoding="ISO-8859-1"?>
<getnewsalesresult xmlns="https://pmcdn.priceminister.com/res/schema/getnewsales">
  <request>
    <version>2017-08-07</version>
    <user>SELLER</user>
  </request>  

  <response>
    <lastversion>2017-08-07</lastversion>
    <sellerid>95029358</sellerid>
    <sales>

      <sale>
        <purchaseid>297453287592813953</purchaseid>
        <purchasedate>15/12/2018-19:10</purchasedate>
        <deliveryinformation>
          <shippingtype>Normal</shippingtype>
          <isfullrsl>N</isfullrsl>

          <purchasebuyerlogin><![CDATA[LOGIN]]></purchasebuyerlogin>                  
          <purchasebuyeremail>EMAIL</purchasebuyeremail>        


            <deliveryaddress>
            <civility>Mme</civility>
            <lastname><![CDATA[Lastname]]></lastname>
            <firstname><![CDATA[Firstname]]></firstname>
            <address1><![CDATA[STREET]]></address1>
            <address2><![CDATA[]]></address2>
            <zipcode>13570</zipcode>
            <city><![CDATA[Paris]]></city>

            <country><![CDATA[France]]></country>
            <countryalpha2>FX</countryalpha2>

              <phonenumber1></phonenumber1>
              <phonenumber2>PHONENUMBER</phonenumber2>

            </deliveryaddress>

        </deliveryinformation>
        <items>

          <item>
            <sku><![CDATA[SKU1]]></sku>
            <advertid>411812243030</advertid>
            <advertpricelisted>
              <amount>15.99</amount>
              <currency>EUR</currency>
            </advertpricelisted>
            <itemid>551131040</itemid>
            <headline><![CDATA[HEADLINE]]></headline>
            <itemstatus><![CDATA[REQUESTED]]></itemstatus>
            <ispreorder>N</ispreorder>
            <isnego>N</isnego>
            <negotiationcomment></negotiationcomment>
            <price>
              <amount>15.99</amount>
              <currency>EUR</currency>
            </price>
            <isrsl>N</isrsl>
            <isbn></isbn>
            <ean>4363745894373857474; </ean>
            <paymentstatus><![CDATA[INCOMING]]></paymentstatus>
            <sellerscore></sellerscore>
          </item>
        </items>
      </sale>
      <sale>

问题是items实际上是ObjectifiedElement的一个方法,所以表达式sale.items 实际上是 returns 方法,因为它有优先权。

为了得到你想要的 'items' 对象,你必须更明确地获取 sale 的属性而不是寻找 [=33= 的方法] 首先,这是通常的 python 顺序。这是 python 在您访问属性时在幕后所做的,您也可以这样做:

sale.__getattr__('items')

这也可以工作(它是对象属性的类似字典的接口):

sale.__dict__['items']

修改后的代码:

import requests
from lxml import objectify

url = "URL_TO_XML_FILE"
xml_content = requests.get(url).text.encode('utf-8')

xml = objectify.fromstring(xml_content)

for sale in xml.response.sales.sale:
    for item in sale.__dict__['items'].item:
        print(item.sku)

另一种处理方法是避免使用不稳定的属性接口:

for sale in xml['response']['sales']['sale']:
    for item in sale['items']['item']:
        print(item['sku'])

使用类似字典的索引接口,您再也不用担心某些属性名称(包括 itemsindexkeys、[=14 等常用词=]、replacetagsettextvalues) 返回令人惊讶的结果。