将数据框中的两列与另一个数据框中的多列匹配,返回第一个匹配列
Matching two columns in dataframe to multiple columns in another dataframe returning first matching column
我正在尝试将一个数据框中的两列与另一个数据框匹配,我希望值 returned 成为第二个数据框中首先与两个初始列匹配的值。
例如:
我想采用以下数据框:
Fasta<-c("X1","X1","X2","X2","X3","X3")
Species<-c("Kiwi","Chicken","Weta","Cricket","Tuatara","Gecko")
testdata<-as.data.frame(cbind(Fasta,Species))
testdata<-aggregate(Species ~ Fasta, testdata, I)
testdata<-aggregate(Species ~ Fasta, testdata, I)
Fasta Species1 Species2
X1 Kiwi Chicken
X2 Weta Cricket
X3 Tuatara Gecko
下面是我的第二个dataframe
Species<-c("Kiwi","Chicken","Weta","Cricket","Frog","Gecko")
Genus<-c("Orn","Norn","Genus2","Genus2","Spec","NoSpec")
Order<-c("Bird","Bird","Order2","Order2","Norder","Geckn")
Kingdom<-rep("Animal",each=6)
lookup<-data.frame(cbind(Species,Genus,Order,Kingdom))
Species Genus Order Kingdom
Kiwi Orn Bird Animal
Chicken Norn Bird Animal
Weta Genus2 Order2 Animal
Cricket Genus2 Order2 Animal
Frog Spec Norder Animal
Gecko NoSpec Geckn Animal
我想在第二个数据框中找到与 Species1 和 Species2 以及 return 名称匹配的第一列。理想情况下,这会给我以下输出:
Fasta Species1 Species2 MatchLevel
X1 Kiwi Chicken Order
X2 Weta Cricket Genus
X3 Tuatara Gecko Kingdom
打开不同格式的数据,
此函数利用了分类组的嵌套性(即,如果两个物种属于同一属,则它们必须处于同一顺序,等等)。同一属中的两个物种得分为 3,因为所有 3 个分类水平都匹配,如果在同一目中则为 2,如果在同一界中则为 1。也可以不匹配。
match2species <- function(a, b, lookup_table = lookup) {
sp_a <- lookup_table[lookup_table$Species == a, ]
sp_b <- lookup_table[lookup_table$Species == b, ]
matches <- sum(sp_a[-1] == sp_b[-1])
ifelse(matches > 0, c('Kingdom','Order','Genus')[matches], 'No match')
}
可以为数据框中的任何一对物种调用该函数。
> match2species('Chicken','Kiwi')
[1] "Order"
> match2species('Weta','Cricket')
[1] "Genus"
> match2species('Frog','Gecko')
[1] "Kingdom"
我正在尝试将一个数据框中的两列与另一个数据框匹配,我希望值 returned 成为第二个数据框中首先与两个初始列匹配的值。
例如: 我想采用以下数据框:
Fasta<-c("X1","X1","X2","X2","X3","X3")
Species<-c("Kiwi","Chicken","Weta","Cricket","Tuatara","Gecko")
testdata<-as.data.frame(cbind(Fasta,Species))
testdata<-aggregate(Species ~ Fasta, testdata, I)
testdata<-aggregate(Species ~ Fasta, testdata, I)
Fasta Species1 Species2
X1 Kiwi Chicken
X2 Weta Cricket
X3 Tuatara Gecko
下面是我的第二个dataframe
Species<-c("Kiwi","Chicken","Weta","Cricket","Frog","Gecko")
Genus<-c("Orn","Norn","Genus2","Genus2","Spec","NoSpec")
Order<-c("Bird","Bird","Order2","Order2","Norder","Geckn")
Kingdom<-rep("Animal",each=6)
lookup<-data.frame(cbind(Species,Genus,Order,Kingdom))
Species Genus Order Kingdom
Kiwi Orn Bird Animal
Chicken Norn Bird Animal
Weta Genus2 Order2 Animal
Cricket Genus2 Order2 Animal
Frog Spec Norder Animal
Gecko NoSpec Geckn Animal
我想在第二个数据框中找到与 Species1 和 Species2 以及 return 名称匹配的第一列。理想情况下,这会给我以下输出:
Fasta Species1 Species2 MatchLevel
X1 Kiwi Chicken Order
X2 Weta Cricket Genus
X3 Tuatara Gecko Kingdom
打开不同格式的数据,
此函数利用了分类组的嵌套性(即,如果两个物种属于同一属,则它们必须处于同一顺序,等等)。同一属中的两个物种得分为 3,因为所有 3 个分类水平都匹配,如果在同一目中则为 2,如果在同一界中则为 1。也可以不匹配。
match2species <- function(a, b, lookup_table = lookup) {
sp_a <- lookup_table[lookup_table$Species == a, ]
sp_b <- lookup_table[lookup_table$Species == b, ]
matches <- sum(sp_a[-1] == sp_b[-1])
ifelse(matches > 0, c('Kingdom','Order','Genus')[matches], 'No match')
}
可以为数据框中的任何一对物种调用该函数。
> match2species('Chicken','Kiwi')
[1] "Order"
> match2species('Weta','Cricket')
[1] "Genus"
> match2species('Frog','Gecko')
[1] "Kingdom"