有没有更快的方法在 R 中找到大量分类单元的同义词？

Question

我有大约 96,000 个物种名称的列表，我需要收集所有同义词。我已经尝试过“taxize' package with the synonyms() function, which outputs the information I need but my list is too long for it to work properly. I have looked into the 'taxizedb”包，之前有人建议它更快，但我不确定这个包中的哪些功能可以完成我想做的事情。

如有任何建议，我们将不胜感激！谢谢！

到目前为止的代码：

library("taxize")
library("tidyverse")

#load in list of species (~96,000)
#vspli <- read.csv(file="AllBHLspecieslist.csv", header=TRUE) #my code
vspli <- c("Acer obtusatum", "Acer interius", "Acer opalus", "Acer saccharum", "Acer palmatum") #workable example
#Use Taxize to search for synonyms
synlist1 <- synonyms(c(vspli), db="itis", rows=1) #currently this line of code crashes before completion when using the list of 96k species

Answer 1

万一以后有人遇到这个问题，我找到了包 'taxadb'，它可以更快地完成这个问题。如果它被证明有用，这里是代码：

library(taxadb)

#create local itis database
td_create("itis",overwrite=FALSE)

allnames<-read.csv(file="AllBHLspecieslist.csv", header=TRUE)



#get  IDS for each scientific name
syn1<-allnames %>%
  select(Scientific.Name) %>%
  mutate(ID=get_ids(Scientific.Name,"itis"))

#Deal with NAs (one name corresponds to more than 1 ITIS code) (~10k names)

syn1_NA<-as.data.frame(syn1$Scientific.Name[is.na(syn1$ID)])
colnames(syn1_NA)<-c("name")

NA_IDS<-NULL
for(i in unique(syn1_NA$name)){
  tmp<-as.data.frame(filter_name(i, 'itis')[5])
  tmp$name<-paste0(i)
  NA_IDS<-rbind(NA_IDS,tmp)
}

#join with originial names
colnames(syn1)<-c("name","ID")
IDS<-left_join(syn1,NA_IDS,by="name") #I think its a left join double check this

#extract just the unique IDs
IDS<-data.frame(ID=c(IDS[,"ID"],IDS[,"acceptedNameUsageID"]))
IDS<-as.data.frame(unique(IDS$ID))
IDS<-as.data.frame(IDS[-is.na(IDS)])
colnames(IDS)<-"ID"
#extract all names with synonyms in ITIS that are at the species level [literally all of them]
#set query
ITIS<-taxa_tbl("itis") %>%
  select(scientificName,taxonRank,acceptedNameUsageID,taxonomicStatus) %>%
  filter(taxonRank == "species")

#see query
ITIS %>% show_query()
#retrieve results
ITIS_names<-ITIS %>% collect()

#filter to only those that match ITIS codes for all my species
ITIS_names<-ITIS_names %>%
  filter(acceptedNameUsageID %in% IDS$ID)

有没有更快的方法在 R 中找到大量分类单元的同义词？

Is there a faster way to find synonyms for a large list of taxa in R?

r

bioinformatics

ropensci