从单个 PubMed 记录中提取隶属关系数据
Extracting Affiliation data from a single PubMed record
通过使用 easyPubMed 和大量搜索,我已经成功地从单个 pubmed 记录中提取了隶属关系数据(我对 R 还是很陌生)。数据的问题是它只报告了一部分从属信息,我假设这是由于非标准化字符串中的各种类型的信息。
我的代码如下:
#PubMed query via easyPubMed using the URL of the XML
my_query <- "https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=pubmed&id=20301425&retmode=xml"
my_entrez_id <- get_pubmed_ids(my_query)
my_abstracts_txt <- fetch_pubmed_data(my_entrez_id, format = "abstract")
print(my_abstracts_txt[1:16])
my_abstracts_xml <- fetch_pubmed_data(my_entrez_id)
class(my_abstracts_xml)
print(my_titles)
#EasyPubMed Extracting Affiliation data from a single PubMed Record
#Convert XML PubMed records to strings using the articles_to_list function
#Each record in the list is a string that still includes XML tags
my_PM_list <- articles_to_list(my_abstracts_xml)
class(my_PM_list[[4]])
cat(substr(my_PM_list[[4]], 1, 984))
#Affiliation can be extracted from a specific record using the custom_grep() function
#The fields extracted from the record will be returned as elements of a list or a character vector
curr_PM_record <- my_PM_list[[(length(my_PM_list) - 3)]]
Affiliation_Info.data <- custom_grep(curr_PM_record, tag = "AffiliationInfo")
View(Affiliation_Info)
curr_PM_record <- my_PM_list[[(length(my_PM_list) - 3)]]
理想情况下,我想生成一个数据框,例如:
PMID:作者:隶属关系
(但首先只关注从 pubmed URL 中提取所有隶属关系信息)
但我真的很难做到这一点,如果能在这件事上提供任何帮助,我将不胜感激
提前致谢!
这是一个xml2
方法...
library( xml2 )
library( magrittr )
#read the xml-data
doc <- xml2::read_xml( "https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=pubmed&id=20301425&retmode=xml" )
pmid <- xml2::xml_find_first( doc, ".//PMID") %>% xml2::xml_text()
authors <- paste(
xml2::xml_find_all( doc, ".//AuthorList[@Type = 'authors']/Author/LastName") %>% xml2::xml_text(),
xml2::xml_find_all( doc, ".//AuthorList[@Type = 'authors']/Author/ForeName") %>% xml2::xml_text(),
sep = ", " )
affiliate <- xml2::xml_find_all( doc, ".//AuthorList[@Type = 'authors']/Author/AffiliationInfo/Affiliation") %>% xml2::xml_text()
df <- data.frame( pmid = pmid, authors = authors, affiliate = affiliate )
whick 看起来像:
通过使用 easyPubMed 和大量搜索,我已经成功地从单个 pubmed 记录中提取了隶属关系数据(我对 R 还是很陌生)。数据的问题是它只报告了一部分从属信息,我假设这是由于非标准化字符串中的各种类型的信息。
我的代码如下:
#PubMed query via easyPubMed using the URL of the XML
my_query <- "https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=pubmed&id=20301425&retmode=xml"
my_entrez_id <- get_pubmed_ids(my_query)
my_abstracts_txt <- fetch_pubmed_data(my_entrez_id, format = "abstract")
print(my_abstracts_txt[1:16])
my_abstracts_xml <- fetch_pubmed_data(my_entrez_id)
class(my_abstracts_xml)
print(my_titles)
#EasyPubMed Extracting Affiliation data from a single PubMed Record
#Convert XML PubMed records to strings using the articles_to_list function
#Each record in the list is a string that still includes XML tags
my_PM_list <- articles_to_list(my_abstracts_xml)
class(my_PM_list[[4]])
cat(substr(my_PM_list[[4]], 1, 984))
#Affiliation can be extracted from a specific record using the custom_grep() function
#The fields extracted from the record will be returned as elements of a list or a character vector
curr_PM_record <- my_PM_list[[(length(my_PM_list) - 3)]]
Affiliation_Info.data <- custom_grep(curr_PM_record, tag = "AffiliationInfo")
View(Affiliation_Info)
curr_PM_record <- my_PM_list[[(length(my_PM_list) - 3)]]
理想情况下,我想生成一个数据框,例如: PMID:作者:隶属关系
(但首先只关注从 pubmed URL 中提取所有隶属关系信息)
但我真的很难做到这一点,如果能在这件事上提供任何帮助,我将不胜感激
提前致谢!
这是一个xml2
方法...
library( xml2 )
library( magrittr )
#read the xml-data
doc <- xml2::read_xml( "https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=pubmed&id=20301425&retmode=xml" )
pmid <- xml2::xml_find_first( doc, ".//PMID") %>% xml2::xml_text()
authors <- paste(
xml2::xml_find_all( doc, ".//AuthorList[@Type = 'authors']/Author/LastName") %>% xml2::xml_text(),
xml2::xml_find_all( doc, ".//AuthorList[@Type = 'authors']/Author/ForeName") %>% xml2::xml_text(),
sep = ", " )
affiliate <- xml2::xml_find_all( doc, ".//AuthorList[@Type = 'authors']/Author/AffiliationInfo/Affiliation") %>% xml2::xml_text()
df <- data.frame( pmid = pmid, authors = authors, affiliate = affiliate )
whick 看起来像: