为什么斯坦福NLP课程中edit Distance算法的定义加2而不是1

Question

我正在通过以下幻灯片学习斯坦福 NLP 课程：https://web.stanford.edu/class/cs124/lec/med.pdf。本幻灯片中编辑距离算法的定义如下：

初始化

D(i,0) = i
D(0,j) = j

重复关系：

 For each  i = 1…M
    For each  j = 1…N


       D(i,j)= min  {D(i-1,j) + 1, D(i,j-1) + 1, 
                     D(i-1,j-1) +   2(if X(i) ≠ Y(j) )  
                                    0(if X(i) = Y(j))}

为什么 D(i-1,j-1) + 2 不是 (+1)，如果 X(i) ≠ Y(j)。我发现维基百科中编辑距离算法的定义是'+1'：https://en.wikipedia.org/wiki/Levenshtein_distance。请你们解释一下这两个定义的区别。我是 NLP 的新手。谢谢！

Answer 1

When editing one string, what is the minimal number of changes you need to do in order to get another string?

这是编辑距离的一般定义，并非具体定义。要获得准确的定义，您需要定义 "change" 是什么。

如果 "change" 包含 "replacing one letter by another"，则您的定义中有 +1
如果 "change" 不包含 "replacing one letter by another"，则您的定义中有 +2

示例：将 feh 变为 fah 需要多少更改？

一项更改 - 只需将 e 替换为 a
两项更改 - 删除 e；然后在同一个地方添加a

这两个答案都很有用，并且导致了两个略有不同的编辑距离定义。

为什么斯坦福NLP课程中edit Distance算法的定义加2而不是1

Why the definition of edit Distance algorithm in Stanford NLP course plus 2 not 1

algorithm

stanford-nlp

levenshtein-distance