使用 ElelemtTree 从 XML 标签中提取作者姓名

Question

以下是 link 访问 XML 文档：

https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=pubmed&id=%2726161999%27&retmode=xml

我正在尝试提取包括姓氏+名字的作者姓名，并制作一个仅包含作者姓名的字符串。我只能单独提取详细信息。

以下是我试过的代码

     r = requests.get(
                'https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=pubmed&id='26161999'&retmode=xml')
    root = et.fromstring(r.content)
    for elem in root.findall(".//ForeName"):
        elem_ = elem.text
        auth_name = list(elem_.split(" "))
        authordata.append(auth_name)
    val = [item if isinstance(item, str) else " ".join(item) for item in authordata]         #flattening the list since its a nested list, converting nested list into string
    seen = set()
    val = [x for x in val if x not in seen and not seen.add(x)]
    author= ' '.join(val)
    print(author)

以上代码得到的输出为：

Elisa Riccardo Mirco Laura Valentina Antonio Sara Carla Borri Barbara

预期输出是名字 + 姓氏的组合：

Elisa Oppici Riccardo Montioli Mirco Dindo Laura Maccari Valentina Porcari Antonio Lorenzetto Chellini Sara Carla Borri Voltattorni Barbara Cellini

Answer 1

根据您的问题，我了解到您希望将每个作者的 ForeName 和 LastName 串联起来。您可以通过直接查询树中每个 Author 元素的那些字段并连接相应的文本字段来实现：

import xml.etree.ElementTree as et
import requests

r = requests.get(
     'https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=pubmed&id="26161999"&retmode=xml'
)
root = et.fromstring(r.content)

author_names = []
for author in root.findall(".//Author"):
    fore_name = author.find('ForeName').text
    last_name = author.find('LastName').text
    author_names.append(fore_name + ' ' + last_name)

print(author_names)

# or to get your exact output format:
print(' '.join(author_names))

使用 ElelemtTree 从 XML 标签中提取作者姓名

Extracting Author name from XML tags using ElelemtTree

python

elementtree

python-3.x