如何在 ElementTree 中使用 xpath 搜索多个属性

How do I searh for multiple attributes using xpath in ElementTree

我希望从 XML 文件 (https://digitallibrary.un.org/search?ln=en&p=A/RES/72/266&f=&rm=&ln=en&sf=&so=d&rg=50&c=United+Nations+Digital+Library+System&of=xm&fti=0&fti=0) 中分离出以下值。

<collection>
  <record>
    ...
    <datafield tag="993" ind1="2" ind2=" ">
      <subfield code="a">A/C.5/72/L.22</subfield> # Value to isolate: A/C.5/72/L.22
    </datafield>
    <datafield tag="993" ind1="3" ind2=" ">
      <subfield code="a">A/72/682</subfield> # Value to isolate: A/72/682
    </datafield>
    <datafield tag="993" ind1="4" ind2=" ">
      <subfield code="a">A/72/PV.76</subfield> # Value to isolate: A/72/PV.76
    </datafield>
    ...
  </record>
  <record>
    ...
    <datafield tag="993" ind1="2" ind2=" ">
      <subfield code="a">A/C.5/72/L.22</subfield> # Value to isolate: A/C.5/72/L.22
    </datafield>
    <datafield tag="993" ind1="3" ind2=" ">
      <subfield code="a">A/72/682</subfield> # Value to isolate: A/72/682
    </datafield>
  </record>
  ...
</collection>

我准备的代码似乎只为每条记录识别第一个标签为 993 的项目。

for record in root:
  if record.find("{http://www.loc.gov/MARC21/slim}datafield[@tag='993']/{http://www.loc.gov/MARC21/slim}subfield[@code='a']") is not None:
    symbol = record.find("{http://www.loc.gov/MARC21/slim}datafield[@tag='993']/{http://www.loc.gov/MARC21/slim}subfield[@code='a']").text
    print symbol

有没有办法使用 ElementTree 的 xpath 循环搜索多个属性?提前谢谢你。

docs表明.find()只获取第一个匹配的子元素。听起来你想要 .findall().

以下似乎对我有用:

import xml.etree.ElementTree as ET
tree = ET.parse(input_file)
root = tree.getroot()

for record in root:
    xpath = "{http://www.loc.gov/MARC21/slim}datafield[@tag='993']/{http://www.loc.gov/MARC21/slim}subfield[@code='a']"
    if record.findall(xpath) is not None:
        for symbol in record.findall(xpath):
            print symbol.text

要完成 user3091877 的回答,备用 XPath 选项:

//*[name()="subfield"][@code="a"][parent::*[@tag="993"]]/text()

编辑:这个将 return 6 个值(@tag=993 和 @ind1=3):

//*[name()="subfield"][parent::*[@tag="993" and @ind1="3"]]/text()