使用 ElelemtTree 从 XML 标签中提取作者姓名
Extracting Author name from XML tags using ElelemtTree
以下是 link 访问 XML 文档:
https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=pubmed&id=%2726161999%27&retmode=xml
我正在尝试提取包括姓氏+名字的作者姓名,并制作一个仅包含作者姓名的字符串。我只能单独提取详细信息。
以下是我试过的代码
r = requests.get(
'https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=pubmed&id='26161999'&retmode=xml')
root = et.fromstring(r.content)
for elem in root.findall(".//ForeName"):
elem_ = elem.text
auth_name = list(elem_.split(" "))
authordata.append(auth_name)
val = [item if isinstance(item, str) else " ".join(item) for item in authordata] #flattening the list since its a nested list, converting nested list into string
seen = set()
val = [x for x in val if x not in seen and not seen.add(x)]
author= ' '.join(val)
print(author)
以上代码得到的输出为:
Elisa Riccardo Mirco Laura Valentina Antonio Sara Carla Borri Barbara
预期输出是名字 + 姓氏的组合:
Elisa Oppici Riccardo Montioli Mirco Dindo Laura Maccari Valentina Porcari Antonio Lorenzetto Chellini Sara Carla Borri Voltattorni Barbara Cellini
根据您的问题,我了解到您希望将每个作者的 ForeName 和 LastName 串联起来。您可以通过直接查询树中每个 Author 元素的那些字段并连接相应的文本字段来实现:
import xml.etree.ElementTree as et
import requests
r = requests.get(
'https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=pubmed&id="26161999"&retmode=xml'
)
root = et.fromstring(r.content)
author_names = []
for author in root.findall(".//Author"):
fore_name = author.find('ForeName').text
last_name = author.find('LastName').text
author_names.append(fore_name + ' ' + last_name)
print(author_names)
# or to get your exact output format:
print(' '.join(author_names))
以下是 link 访问 XML 文档:
https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=pubmed&id=%2726161999%27&retmode=xml
我正在尝试提取包括姓氏+名字的作者姓名,并制作一个仅包含作者姓名的字符串。我只能单独提取详细信息。
以下是我试过的代码
r = requests.get(
'https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=pubmed&id='26161999'&retmode=xml')
root = et.fromstring(r.content)
for elem in root.findall(".//ForeName"):
elem_ = elem.text
auth_name = list(elem_.split(" "))
authordata.append(auth_name)
val = [item if isinstance(item, str) else " ".join(item) for item in authordata] #flattening the list since its a nested list, converting nested list into string
seen = set()
val = [x for x in val if x not in seen and not seen.add(x)]
author= ' '.join(val)
print(author)
以上代码得到的输出为:
Elisa Riccardo Mirco Laura Valentina Antonio Sara Carla Borri Barbara
预期输出是名字 + 姓氏的组合:
Elisa Oppici Riccardo Montioli Mirco Dindo Laura Maccari Valentina Porcari Antonio Lorenzetto Chellini Sara Carla Borri Voltattorni Barbara Cellini
根据您的问题,我了解到您希望将每个作者的 ForeName 和 LastName 串联起来。您可以通过直接查询树中每个 Author 元素的那些字段并连接相应的文本字段来实现:
import xml.etree.ElementTree as et
import requests
r = requests.get(
'https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=pubmed&id="26161999"&retmode=xml'
)
root = et.fromstring(r.content)
author_names = []
for author in root.findall(".//Author"):
fore_name = author.find('ForeName').text
last_name = author.find('LastName').text
author_names.append(fore_name + ' ' + last_name)
print(author_names)
# or to get your exact output format:
print(' '.join(author_names))