有没有更快的方法在 R 中找到大量分类单元的同义词?
Is there a faster way to find synonyms for a large list of taxa in R?
我有大约 96,000 个物种名称的列表,我需要收集所有同义词。我已经尝试过“taxize' package with the synonyms() function, which outputs the information I need but my list is too long for it to work properly. I have looked into the 'taxizedb”包,之前有人建议它更快,但我不确定这个包中的哪些功能可以完成我想做的事情。
如有任何建议,我们将不胜感激!谢谢!
到目前为止的代码:
library("taxize")
library("tidyverse")
#load in list of species (~96,000)
#vspli <- read.csv(file="AllBHLspecieslist.csv", header=TRUE) #my code
vspli <- c("Acer obtusatum", "Acer interius", "Acer opalus", "Acer saccharum", "Acer palmatum") #workable example
#Use Taxize to search for synonyms
synlist1 <- synonyms(c(vspli), db="itis", rows=1) #currently this line of code crashes before completion when using the list of 96k species
万一以后有人遇到这个问题,我找到了包 'taxadb',它可以更快地完成这个问题。如果它被证明有用,这里是代码:
library(taxadb)
#create local itis database
td_create("itis",overwrite=FALSE)
allnames<-read.csv(file="AllBHLspecieslist.csv", header=TRUE)
#get IDS for each scientific name
syn1<-allnames %>%
select(Scientific.Name) %>%
mutate(ID=get_ids(Scientific.Name,"itis"))
#Deal with NAs (one name corresponds to more than 1 ITIS code) (~10k names)
syn1_NA<-as.data.frame(syn1$Scientific.Name[is.na(syn1$ID)])
colnames(syn1_NA)<-c("name")
NA_IDS<-NULL
for(i in unique(syn1_NA$name)){
tmp<-as.data.frame(filter_name(i, 'itis')[5])
tmp$name<-paste0(i)
NA_IDS<-rbind(NA_IDS,tmp)
}
#join with originial names
colnames(syn1)<-c("name","ID")
IDS<-left_join(syn1,NA_IDS,by="name") #I think its a left join double check this
#extract just the unique IDs
IDS<-data.frame(ID=c(IDS[,"ID"],IDS[,"acceptedNameUsageID"]))
IDS<-as.data.frame(unique(IDS$ID))
IDS<-as.data.frame(IDS[-is.na(IDS)])
colnames(IDS)<-"ID"
#extract all names with synonyms in ITIS that are at the species level [literally all of them]
#set query
ITIS<-taxa_tbl("itis") %>%
select(scientificName,taxonRank,acceptedNameUsageID,taxonomicStatus) %>%
filter(taxonRank == "species")
#see query
ITIS %>% show_query()
#retrieve results
ITIS_names<-ITIS %>% collect()
#filter to only those that match ITIS codes for all my species
ITIS_names<-ITIS_names %>%
filter(acceptedNameUsageID %in% IDS$ID)
我有大约 96,000 个物种名称的列表,我需要收集所有同义词。我已经尝试过“taxize' package with the synonyms() function, which outputs the information I need but my list is too long for it to work properly. I have looked into the 'taxizedb”包,之前有人建议它更快,但我不确定这个包中的哪些功能可以完成我想做的事情。
如有任何建议,我们将不胜感激!谢谢!
到目前为止的代码:
library("taxize")
library("tidyverse")
#load in list of species (~96,000)
#vspli <- read.csv(file="AllBHLspecieslist.csv", header=TRUE) #my code
vspli <- c("Acer obtusatum", "Acer interius", "Acer opalus", "Acer saccharum", "Acer palmatum") #workable example
#Use Taxize to search for synonyms
synlist1 <- synonyms(c(vspli), db="itis", rows=1) #currently this line of code crashes before completion when using the list of 96k species
万一以后有人遇到这个问题,我找到了包 'taxadb',它可以更快地完成这个问题。如果它被证明有用,这里是代码:
library(taxadb)
#create local itis database
td_create("itis",overwrite=FALSE)
allnames<-read.csv(file="AllBHLspecieslist.csv", header=TRUE)
#get IDS for each scientific name
syn1<-allnames %>%
select(Scientific.Name) %>%
mutate(ID=get_ids(Scientific.Name,"itis"))
#Deal with NAs (one name corresponds to more than 1 ITIS code) (~10k names)
syn1_NA<-as.data.frame(syn1$Scientific.Name[is.na(syn1$ID)])
colnames(syn1_NA)<-c("name")
NA_IDS<-NULL
for(i in unique(syn1_NA$name)){
tmp<-as.data.frame(filter_name(i, 'itis')[5])
tmp$name<-paste0(i)
NA_IDS<-rbind(NA_IDS,tmp)
}
#join with originial names
colnames(syn1)<-c("name","ID")
IDS<-left_join(syn1,NA_IDS,by="name") #I think its a left join double check this
#extract just the unique IDs
IDS<-data.frame(ID=c(IDS[,"ID"],IDS[,"acceptedNameUsageID"]))
IDS<-as.data.frame(unique(IDS$ID))
IDS<-as.data.frame(IDS[-is.na(IDS)])
colnames(IDS)<-"ID"
#extract all names with synonyms in ITIS that are at the species level [literally all of them]
#set query
ITIS<-taxa_tbl("itis") %>%
select(scientificName,taxonRank,acceptedNameUsageID,taxonomicStatus) %>%
filter(taxonRank == "species")
#see query
ITIS %>% show_query()
#retrieve results
ITIS_names<-ITIS %>% collect()
#filter to only those that match ITIS codes for all my species
ITIS_names<-ITIS_names %>%
filter(acceptedNameUsageID %in% IDS$ID)