调试 Levenshtein 距离实现 - 如何计算最小距离?

Debugging Levenshtein distance implementation - how is the min distance being calculated?

一段时间以来,我一直在努力理解这段代码,但我无法理解距离是如何计算的。我知道该算法应该如何工作(它提供了两个字符串之间字符的编辑距离,我可以在纸上完成),但我不明白这一行如何(作为示例)

d(i, j) = d(i - 1, j - 1)

返回为整数。 min1、min2 相同。我们已经定义

d(i,0)

d(0,j)

,但是在

的时候怎么取值

d(i,j)

两个参数之一不为 0 吗?

代码:

Option Explicit
Public Function Levenshtein(s1 As String, s2 As String)

Dim i As Integer
Dim j As Integer
Dim l1 As Integer
Dim l2 As Integer
Dim d() As Integer
Dim min1 As Integer
Dim min2 As Integer

l1 = Len(s1)
l2 = Len(s2)
ReDim d(l1, l2)
For i = 0 To l1
    d(i, 0) = i
Next
For j = 0 To l2
    d(0, j) = j
Next
For i = 1 To l1
    For j = 1 To l2
        If Mid(s1, i, 1) = Mid(s2, j, 1) Then
            d(i, j) = d(i - 1, j - 1)   
        Else
            min1 = d(i - 1, j) + 1
            min2 = d(i, j - 1) + 1
            If min2 < min1 Then
                min1 = min2
            End If
            min2 = d(i - 1, j - 1) + 1
            If min2 < min1 Then
                min1 = min2
            End If
            d(i, j) = min1
        End If
    Next
Next
Levenshtein = d(l1, l2)
End Function

?Levenshtein("saturday","sunday")

3

我对 Levenshtein 了解不多,但我可以解释该数组中发生了什么,请参阅代码中的注释:

l1 = Len(s1) 'this sets a variable with the length of the first string
l2 = Len(s2) 'this sets a variable with the length of the second string

ReDim d(l1, l2) 'redimension the array to this size, though this can be seen as ReDim d(0 to l1, 0 to l2)

For i = 0 To l1 'for i = 0 to length of first string
    d(i, 0) = i 'allocate the value of i to each row in the array, at col 0
Next i

For j = 0 To l2 'for j = 0 to length of second string
    d(0, j) = j 'allocate the value of j to each column in the array, at row 0
Next j

编辑: 我添加了一些调试(将打印到 ActiveSheet)。蓝色代表字母匹配的地方,红色代表其他字母......虽然颜色并不重要。

据我所知,在每次循环迭代中,它都会构建一个差异矩阵,从比较 1:1 第一个字母开始,直到最后一个。基于循环位置,并且在每次迭代中,它将使用先前的位置值来计算当前位置的差异。

使用较短的字符串可能更容易理解。

  • 遍历第二个单词中的每个字母,然后循环遍历第一个单词中的每个字母

  • 在第一个外循环(对于每一行),第一个内循环(对于每一列),我们有 S 满足 S = 0。(d(i, j) = d(i - 1, j - 1) 的计算结果为 d(i, j) = d(1 - 1, 1 - 1),即:0)

  • 在第一个外循环,第二个内循环,我们有 S 满足 U = 1.

  • 等等等等

除此之外,单步执行代码并查看变量如何根据 if 条件变化...不确定我能否更好地解释这一点。

Public Function Levenshtein(s1 As String, s2 As String)

Dim i As Integer, j As Integer
Dim l1 As Integer, l2 As Integer
Dim min1 As Integer, min2 As Integer
Dim d() As Integer

'For debugging purposes only
Cells.Clear
Dim rngOutput As Range: Set rngOutput = ActiveSheet.Range("A1").Resize(Len(s1) + 2, Len(s2) + 2)

With rngOutput
    .ColumnWidth = 3
    .HorizontalAlignment = xlCenter
    .VerticalAlignment = xlCenter
End With

l1 = Len(s1): l2 = Len(s2): ReDim d(l1, l2)

For i = 0 To l1
    d(i, 0) = i
    With rngOutput
        .Cells(i + 3, 1) = Mid(s1, i + 1, 1)
        If Not i = 0 Then .Cells(i + 2, 2) = i
    End With
Next i

For j = 0 To l2
    d(0, j) = j
    With rngOutput
        .Cells(1, j + 3) = Mid(s2, j + 1, 1)
        If Not j = 0 Then .Cells(2, j + 2) = j
    End With
Next j

For i = 1 To l1
    For j = 1 To l2
        If Mid(s1, i, 1) = Mid(s2, j, 1) Then
            d(i, j) = d(i - 1, j - 1)

            With rngOutput.Cells(i + 2, j + 2)
                .Value = d(i, j)
                .Font.Color = vbBlue
            End With
        Else
            min1 = d(i - 1, j) + 1
            min2 = d(i, j - 1) + 1
            If min2 < min1 Then
                min1 = min2
            End If
            min2 = d(i - 1, j - 1) + 1
            If min2 < min1 Then
                min1 = min2
            End If
            d(i, j) = min1

            With Cells(i + 2, j + 2)
                .Value = d(i, j)
                .Font.Color = vbRed
            End With
        End If
    Next
Next

Levenshtein = d(l1, l2)
End Function