wmd(单词移动距离)和基于 wmd 的相似度有什么区别?

What is the difference between wmd (word mover distance) and wmd based similarity?

我正在使用 WMD 来计算句子之间的相似度。例如:

distance = model.wmdistance(sentence_obama, sentence_president)

参考:https://markroxor.github.io/gensim/static/notebooks/WMD_tutorial.html

不过,还有基于WMD的相似度法(WmdSimilarity).

参考: https://markroxor.github.io/gensim/static/notebooks/WMD_tutorial.html

这两者除了明显的一个是距离另一个是相似之外还有什么区别?

更新: 除了表示形式不同外,两者完全相同。

n_queries = len(query)
result = []
for qidx in range(n_queries):
    # Compute similarity for each query.
    qresult = [self.w2v_model.wmdistance(document, query[qidx]) for document in self.corpus]
    qresult = numpy.array(qresult)
    qresult = 1./(1.+qresult)  # Similarity is the negative of the distance.

    # Append single query result to list of all results.
    result.append(qresult)

https://github.com/RaRe-Technologies/gensim/blob/develop/gensim/similarities/docsim.py

我认为 'update' 你或多或少回答了你自己的问题。

一个是距离,一个是相似度,是两者计算的唯一区别。作为笔记本,您 link 在 relevant section 中记录:

WMD is a measure of distance. The similarities in WmdSimilarity are simply the negative distance. Be careful not to confuse distances and similarities. Two similar documents will have a high similarity score and a small distance; two very different documents will have low similarity score, and a large distance.

正如您摘录的代码所示,那里使用的相似性度量并不完全是 'negative' 距离,而是按比例缩放的,因此所有相似性值都在 0.0(不含)到 1.0(含)之间。 (也就是说,零距离变为 1.0 相似度,但越来越大的距离变得越来越接近 0.0。)