如何获取 xml 文件中子字段节点的数量

Question

我正在尝试从 xml 文件中提取数据。我通过将先前生成的 url 访问到 xml 提供商的 api 来获取 xml 文件。通常我需要的数据域只出现一次，但有时，数据域节点会出现多次。

这是我使用的代码：（这只是代码的一部分，所以缩进可能有点不对）

from urllib.request import urlopen
import pandas as pd
import xml.etree.ElementTree as ET
    with urlopen(str(row)) as response:
                    doc = ET.parse(response)  
                    root = doc.getroot()
                    namespaces = {  
                "zs": "http://www.loc.gov/zing/srw/",
                "": "http://www.loc.gov/MARC21/slim",
                    }
                datafield_nodes_path = "./zs:records/zs:record/zs:recordData/record/datafield"  # XPath
                datafield_attribute_filters = [ #which fields to extract
                {
                "tag": "100", #author
                "ind1": "1",
                "ind2": " ",
                }]      
                no_aut = True
                for datafield_node in root.iterfind(datafield_nodes_path, namespaces=namespaces):
                    if any(datafield_node.get(k) != v for attr_dict in datafield_attribute_filters for k,v in attr_dict.items()):
                        continue
                    
                    for subfield_node in datafield_node.iterfind("./subfield[@code='a']", namespaces=namespaces):
                        clean_aut.append(subfield_node.text) #this gets the author name
                        no_aut = False
                if no_aut: clean_aut.append(None)

这对我访问的 URL 中的 80% 工作正常，但其余 20% 要么损坏，要么有多个 subfield_nodes 用于我正在搜索的 datafield_attribute_filter。

这是一个 URL 多次出现的示例：example link

当这个 URL 被加载到 urlopen 时，我得到了作者九次而不是一次。有没有办法计算出现的次数，如果 datafield_node 出现不止一次，只取第一次出现的 datafield_node？我尝试使用 ET 的 findall 但没有得到可用的结果。

感谢任何帮助

Answer 1

虽然这不是我想要的解决方法，但还是成功了：

append_author=0
no_aut = True
   for datafield_node in root.iterfind(datafield_nodes_path, namespaces=namespaces):
   if any(datafield_node.get(k) != v for attr_dict in datafield_attribute_filters for k,v in attr_dict.items()):
                        continue
   if append_author ==0
       for subfield_node in datafield_node.iterfind("./subfield[@code='a']", namespaces=namespaces):
       clean_aut.append(subfield_node.text) #this gets the author name
       no_aut = False
       append_author+=1

添加第一个字段后，将跳过其他字段

如何获取 xml 文件中子字段节点的数量

How to get the amount of subfield nodes in a xml file

xml-parsing

python-3.x