wmd（单词移动距离）和基于 wmd 的相似度有什么区别？

Question

我正在使用 WMD 来计算句子之间的相似度。例如：

distance = model.wmdistance(sentence_obama, sentence_president)

参考：https://markroxor.github.io/gensim/static/notebooks/WMD_tutorial.html

不过，还有基于WMD的相似度法(WmdSimilarity).

参考： https://markroxor.github.io/gensim/static/notebooks/WMD_tutorial.html

这两者除了明显的一个是距离另一个是相似之外还有什么区别？

更新： 除了表示形式不同外，两者完全相同。

n_queries = len(query)
result = []
for qidx in range(n_queries):
    # Compute similarity for each query.
    qresult = [self.w2v_model.wmdistance(document, query[qidx]) for document in self.corpus]
    qresult = numpy.array(qresult)
    qresult = 1./(1.+qresult)  # Similarity is the negative of the distance.

    # Append single query result to list of all results.
    result.append(qresult)

https://github.com/RaRe-Technologies/gensim/blob/develop/gensim/similarities/docsim.py

Answer 1

我认为 'update' 你或多或少回答了你自己的问题。

一个是距离，一个是相似度，是两者计算的唯一区别。作为笔记本，您 link 在 relevant section 中记录：

WMD is a measure of distance. The similarities in WmdSimilarity are simply the negative distance. Be careful not to confuse distances and similarities. Two similar documents will have a high similarity score and a small distance; two very different documents will have low similarity score, and a large distance.

正如您摘录的代码所示，那里使用的相似性度量并不完全是 'negative' 距离，而是按比例缩放的，因此所有相似性值都在 0.0（不含）到 1.0（含）之间。（也就是说，零距离变为 1.0 相似度，但越来越大的距离变得越来越接近 0.0。）

wmd（单词移动距离）和基于 wmd 的相似度有什么区别？

What is the difference between wmd (word mover distance) and wmd based similarity?

nlp

nltk

gensim

word2vec

word-embedding