
Efficiently accessing pairwise distances


> mat
          hydrogen   helium  lithium beryllium    boron
hydrogen  0.000000 2.065564 3.940308  2.647510 2.671674
helium    2.065564 0.000000 2.365661  1.697749 1.319400
lithium   3.940308 2.365661 0.000000  3.188148 2.411567
beryllium 2.647510 1.697749 3.188148  0.000000 2.499369
boron     2.671674 1.319400 2.411567  2.499369 0.000000


> results

El1      El2    Score
Helium Hydrogen   92
Boron   Helium    61
Boron  Lithium    88

我想计算 results$El1results$El2 中单词之间的所有成对距离,以获得以下结果:

> results

El1      El2    Score   Dist
Helium Hydrogen   92    2.065564
Boron   Helium    61    1.319400
Boron  Lithium    88    2.411567

我是用 f​​or 循环做的,但它看起来真的很笨拙。有没有更优雅的方法用更少的代码行来搜索和提取距离?


names = row.names(mat) 
num.results <- dim(results)[1]   
El1 =  match(results$El1, names)  
El2 = match(results$El2, names)    
el.dist <- matrix(0, num.results, 1)        
for (i1 in c(1:num.results)) {             
el.dist[i1, 1] <- mat[El1[i1], El2[i1]]
results$Dist = el.dist[,1] 
cols <- match(tolower(results$El1), colnames(mat))
rows <- match(tolower(results$El2), colnames(mat))
results$Dist <- mat[cbind(rows, cols)]
     El1      El2 Score     Dist
1 Helium Hydrogen    92 2.065564
2  Boron   Helium    61 1.319400
3  Boron  Lithium    88 2.411567

您会认出大部分代码。要重点关注的是mat[cbind(rows, cols)]。对于矩阵,我们可以通过列数与维度相同的另一个矩阵进行子集化。来自 ?`[` 帮助:

When indexing arrays by [ a single argument i can be a matrix with as many columns as there are dimensions of x; the result is then a vector with elements corresponding to the sets of indices in each row of i.


results$Dist <- mapply(function(x, y) mat[tolower(x), tolower(y)],
                       results$El1, results$El2)

这假设 results 使用 character 而不是 factor 用于 El1El2


> results
     El1      El2 Score     Dist
1 Helium Hydrogen    92 2.065564
2  Boron   Helium    61 1.319400
3  Boron  Lithium    88 2.411567