调试 Levenshtein 距离实现 - 如何计算最小距离?
Debugging Levenshtein distance implementation - how is the min distance being calculated?
一段时间以来,我一直在努力理解这段代码,但我无法理解距离是如何计算的。我知道该算法应该如何工作(它提供了两个字符串之间字符的编辑距离,我可以在纸上完成),但我不明白这一行如何(作为示例)
d(i, j) = d(i - 1, j - 1)
返回为整数。 min1、min2 相同。我们已经定义
d(i,0)
和
d(0,j)
,但是在
的时候怎么取值
d(i,j)
两个参数之一不为 0 吗?
代码:
Option Explicit
Public Function Levenshtein(s1 As String, s2 As String)
Dim i As Integer
Dim j As Integer
Dim l1 As Integer
Dim l2 As Integer
Dim d() As Integer
Dim min1 As Integer
Dim min2 As Integer
l1 = Len(s1)
l2 = Len(s2)
ReDim d(l1, l2)
For i = 0 To l1
d(i, 0) = i
Next
For j = 0 To l2
d(0, j) = j
Next
For i = 1 To l1
For j = 1 To l2
If Mid(s1, i, 1) = Mid(s2, j, 1) Then
d(i, j) = d(i - 1, j - 1)
Else
min1 = d(i - 1, j) + 1
min2 = d(i, j - 1) + 1
If min2 < min1 Then
min1 = min2
End If
min2 = d(i - 1, j - 1) + 1
If min2 < min1 Then
min1 = min2
End If
d(i, j) = min1
End If
Next
Next
Levenshtein = d(l1, l2)
End Function
?Levenshtein("saturday","sunday")
3
我对 Levenshtein 了解不多,但我可以解释该数组中发生了什么,请参阅代码中的注释:
l1 = Len(s1) 'this sets a variable with the length of the first string
l2 = Len(s2) 'this sets a variable with the length of the second string
ReDim d(l1, l2) 'redimension the array to this size, though this can be seen as ReDim d(0 to l1, 0 to l2)
For i = 0 To l1 'for i = 0 to length of first string
d(i, 0) = i 'allocate the value of i to each row in the array, at col 0
Next i
For j = 0 To l2 'for j = 0 to length of second string
d(0, j) = j 'allocate the value of j to each column in the array, at row 0
Next j
编辑: 我添加了一些调试(将打印到 ActiveSheet)。蓝色代表字母匹配的地方,红色代表其他字母......虽然颜色并不重要。
据我所知,在每次循环迭代中,它都会构建一个差异矩阵,从比较 1:1 第一个字母开始,直到最后一个。基于循环位置,并且在每次迭代中,它将使用先前的位置值来计算当前位置的差异。
使用较短的字符串可能更容易理解。
遍历第二个单词中的每个字母,然后循环遍历第一个单词中的每个字母
在第一个外循环(对于每一行),第一个内循环(对于每一列),我们有 S
满足 S
= 0。(d(i, j) = d(i - 1, j - 1)
的计算结果为 d(i, j) = d(1 - 1, 1 - 1)
,即:0
)
在第一个外循环,第二个内循环,我们有 S
满足 U
= 1.
等等等等
除此之外,单步执行代码并查看变量如何根据 if 条件变化...不确定我能否更好地解释这一点。
Public Function Levenshtein(s1 As String, s2 As String)
Dim i As Integer, j As Integer
Dim l1 As Integer, l2 As Integer
Dim min1 As Integer, min2 As Integer
Dim d() As Integer
'For debugging purposes only
Cells.Clear
Dim rngOutput As Range: Set rngOutput = ActiveSheet.Range("A1").Resize(Len(s1) + 2, Len(s2) + 2)
With rngOutput
.ColumnWidth = 3
.HorizontalAlignment = xlCenter
.VerticalAlignment = xlCenter
End With
l1 = Len(s1): l2 = Len(s2): ReDim d(l1, l2)
For i = 0 To l1
d(i, 0) = i
With rngOutput
.Cells(i + 3, 1) = Mid(s1, i + 1, 1)
If Not i = 0 Then .Cells(i + 2, 2) = i
End With
Next i
For j = 0 To l2
d(0, j) = j
With rngOutput
.Cells(1, j + 3) = Mid(s2, j + 1, 1)
If Not j = 0 Then .Cells(2, j + 2) = j
End With
Next j
For i = 1 To l1
For j = 1 To l2
If Mid(s1, i, 1) = Mid(s2, j, 1) Then
d(i, j) = d(i - 1, j - 1)
With rngOutput.Cells(i + 2, j + 2)
.Value = d(i, j)
.Font.Color = vbBlue
End With
Else
min1 = d(i - 1, j) + 1
min2 = d(i, j - 1) + 1
If min2 < min1 Then
min1 = min2
End If
min2 = d(i - 1, j - 1) + 1
If min2 < min1 Then
min1 = min2
End If
d(i, j) = min1
With Cells(i + 2, j + 2)
.Value = d(i, j)
.Font.Color = vbRed
End With
End If
Next
Next
Levenshtein = d(l1, l2)
End Function
一段时间以来,我一直在努力理解这段代码,但我无法理解距离是如何计算的。我知道该算法应该如何工作(它提供了两个字符串之间字符的编辑距离,我可以在纸上完成),但我不明白这一行如何(作为示例)
d(i, j) = d(i - 1, j - 1)
返回为整数。 min1、min2 相同。我们已经定义
d(i,0)
和
d(0,j)
,但是在
的时候怎么取值d(i,j)
两个参数之一不为 0 吗?
代码:
Option Explicit
Public Function Levenshtein(s1 As String, s2 As String)
Dim i As Integer
Dim j As Integer
Dim l1 As Integer
Dim l2 As Integer
Dim d() As Integer
Dim min1 As Integer
Dim min2 As Integer
l1 = Len(s1)
l2 = Len(s2)
ReDim d(l1, l2)
For i = 0 To l1
d(i, 0) = i
Next
For j = 0 To l2
d(0, j) = j
Next
For i = 1 To l1
For j = 1 To l2
If Mid(s1, i, 1) = Mid(s2, j, 1) Then
d(i, j) = d(i - 1, j - 1)
Else
min1 = d(i - 1, j) + 1
min2 = d(i, j - 1) + 1
If min2 < min1 Then
min1 = min2
End If
min2 = d(i - 1, j - 1) + 1
If min2 < min1 Then
min1 = min2
End If
d(i, j) = min1
End If
Next
Next
Levenshtein = d(l1, l2)
End Function
?Levenshtein("saturday","sunday")
3
我对 Levenshtein 了解不多,但我可以解释该数组中发生了什么,请参阅代码中的注释:
l1 = Len(s1) 'this sets a variable with the length of the first string
l2 = Len(s2) 'this sets a variable with the length of the second string
ReDim d(l1, l2) 'redimension the array to this size, though this can be seen as ReDim d(0 to l1, 0 to l2)
For i = 0 To l1 'for i = 0 to length of first string
d(i, 0) = i 'allocate the value of i to each row in the array, at col 0
Next i
For j = 0 To l2 'for j = 0 to length of second string
d(0, j) = j 'allocate the value of j to each column in the array, at row 0
Next j
编辑: 我添加了一些调试(将打印到 ActiveSheet)。蓝色代表字母匹配的地方,红色代表其他字母......虽然颜色并不重要。
据我所知,在每次循环迭代中,它都会构建一个差异矩阵,从比较 1:1 第一个字母开始,直到最后一个。基于循环位置,并且在每次迭代中,它将使用先前的位置值来计算当前位置的差异。
使用较短的字符串可能更容易理解。
遍历第二个单词中的每个字母,然后循环遍历第一个单词中的每个字母
在第一个外循环(对于每一行),第一个内循环(对于每一列),我们有
S
满足S
= 0。(d(i, j) = d(i - 1, j - 1)
的计算结果为d(i, j) = d(1 - 1, 1 - 1)
,即:0
)在第一个外循环,第二个内循环,我们有
S
满足U
= 1.等等等等
除此之外,单步执行代码并查看变量如何根据 if 条件变化...不确定我能否更好地解释这一点。
Public Function Levenshtein(s1 As String, s2 As String)
Dim i As Integer, j As Integer
Dim l1 As Integer, l2 As Integer
Dim min1 As Integer, min2 As Integer
Dim d() As Integer
'For debugging purposes only
Cells.Clear
Dim rngOutput As Range: Set rngOutput = ActiveSheet.Range("A1").Resize(Len(s1) + 2, Len(s2) + 2)
With rngOutput
.ColumnWidth = 3
.HorizontalAlignment = xlCenter
.VerticalAlignment = xlCenter
End With
l1 = Len(s1): l2 = Len(s2): ReDim d(l1, l2)
For i = 0 To l1
d(i, 0) = i
With rngOutput
.Cells(i + 3, 1) = Mid(s1, i + 1, 1)
If Not i = 0 Then .Cells(i + 2, 2) = i
End With
Next i
For j = 0 To l2
d(0, j) = j
With rngOutput
.Cells(1, j + 3) = Mid(s2, j + 1, 1)
If Not j = 0 Then .Cells(2, j + 2) = j
End With
Next j
For i = 1 To l1
For j = 1 To l2
If Mid(s1, i, 1) = Mid(s2, j, 1) Then
d(i, j) = d(i - 1, j - 1)
With rngOutput.Cells(i + 2, j + 2)
.Value = d(i, j)
.Font.Color = vbBlue
End With
Else
min1 = d(i - 1, j) + 1
min2 = d(i, j - 1) + 1
If min2 < min1 Then
min1 = min2
End If
min2 = d(i - 1, j - 1) + 1
If min2 < min1 Then
min1 = min2
End If
d(i, j) = min1
With Cells(i + 2, j + 2)
.Value = d(i, j)
.Font.Color = vbRed
End With
End If
Next
Next
Levenshtein = d(l1, l2)
End Function