有效访问成对距离
Efficiently accessing pairwise distances
我有一个距离矩阵:
> mat
hydrogen helium lithium beryllium boron
hydrogen 0.000000 2.065564 3.940308 2.647510 2.671674
helium 2.065564 0.000000 2.365661 1.697749 1.319400
lithium 3.940308 2.365661 0.000000 3.188148 2.411567
beryllium 2.647510 1.697749 3.188148 0.000000 2.499369
boron 2.671674 1.319400 2.411567 2.499369 0.000000
还有一个数据框:
> results
El1 El2 Score
Helium Hydrogen 92
Boron Helium 61
Boron Lithium 88
我想计算 results$El1
和 results$El2
中单词之间的所有成对距离,以获得以下结果:
> results
El1 El2 Score Dist
Helium Hydrogen 92 2.065564
Boron Helium 61 1.319400
Boron Lithium 88 2.411567
我是用 for 循环做的,但它看起来真的很笨拙。有没有更优雅的方法用更少的代码行来搜索和提取距离?
这是我当前的代码:
names = row.names(mat)
num.results <- dim(results)[1]
El1 = match(results$El1, names)
El2 = match(results$El2, names)
el.dist <- matrix(0, num.results, 1)
for (i1 in c(1:num.results)) {
el.dist[i1, 1] <- mat[El1[i1], El2[i1]]
}
results$Dist = el.dist[,1]
cols <- match(tolower(results$El1), colnames(mat))
rows <- match(tolower(results$El2), colnames(mat))
results$Dist <- mat[cbind(rows, cols)]
results
El1 El2 Score Dist
1 Helium Hydrogen 92 2.065564
2 Boron Helium 61 1.319400
3 Boron Lithium 88 2.411567
您会认出大部分代码。要重点关注的是mat[cbind(rows, cols)]
。对于矩阵,我们可以通过列数与维度相同的另一个矩阵进行子集化。来自 ?`[`
帮助:
When indexing arrays by [ a single argument i can be a matrix with as many columns as there are dimensions of x; the result is then a vector with elements corresponding to the sets of indices in each row of i.
另一种方法
results$Dist <- mapply(function(x, y) mat[tolower(x), tolower(y)],
results$El1, results$El2)
这假设 results
使用 character
而不是 factor
用于 El1
和 El2
。
结果
> results
El1 El2 Score Dist
1 Helium Hydrogen 92 2.065564
2 Boron Helium 61 1.319400
3 Boron Lithium 88 2.411567
我有一个距离矩阵:
> mat
hydrogen helium lithium beryllium boron
hydrogen 0.000000 2.065564 3.940308 2.647510 2.671674
helium 2.065564 0.000000 2.365661 1.697749 1.319400
lithium 3.940308 2.365661 0.000000 3.188148 2.411567
beryllium 2.647510 1.697749 3.188148 0.000000 2.499369
boron 2.671674 1.319400 2.411567 2.499369 0.000000
还有一个数据框:
> results
El1 El2 Score
Helium Hydrogen 92
Boron Helium 61
Boron Lithium 88
我想计算 results$El1
和 results$El2
中单词之间的所有成对距离,以获得以下结果:
> results
El1 El2 Score Dist
Helium Hydrogen 92 2.065564
Boron Helium 61 1.319400
Boron Lithium 88 2.411567
我是用 for 循环做的,但它看起来真的很笨拙。有没有更优雅的方法用更少的代码行来搜索和提取距离?
这是我当前的代码:
names = row.names(mat)
num.results <- dim(results)[1]
El1 = match(results$El1, names)
El2 = match(results$El2, names)
el.dist <- matrix(0, num.results, 1)
for (i1 in c(1:num.results)) {
el.dist[i1, 1] <- mat[El1[i1], El2[i1]]
}
results$Dist = el.dist[,1]
cols <- match(tolower(results$El1), colnames(mat))
rows <- match(tolower(results$El2), colnames(mat))
results$Dist <- mat[cbind(rows, cols)]
results
El1 El2 Score Dist
1 Helium Hydrogen 92 2.065564
2 Boron Helium 61 1.319400
3 Boron Lithium 88 2.411567
您会认出大部分代码。要重点关注的是mat[cbind(rows, cols)]
。对于矩阵,我们可以通过列数与维度相同的另一个矩阵进行子集化。来自 ?`[`
帮助:
When indexing arrays by [ a single argument i can be a matrix with as many columns as there are dimensions of x; the result is then a vector with elements corresponding to the sets of indices in each row of i.
另一种方法
results$Dist <- mapply(function(x, y) mat[tolower(x), tolower(y)],
results$El1, results$El2)
这假设 results
使用 character
而不是 factor
用于 El1
和 El2
。
结果
> results
El1 El2 Score Dist
1 Helium Hydrogen 92 2.065564
2 Boron Helium 61 1.319400
3 Boron Lithium 88 2.411567