任何人都可以改进 VBA 的以下 Fuzzyfind 函数吗?
Can anyone improve on the below Fuzzyfind function for VBA?
此功能可让您从一定范围内找到相似的字符串,而无需进行精确搜索。
公式如下所示:=FuzzyFind(A1,B$1:B$20)
假设您要搜索的字符串在 A1 中
并且您的参考或选项 table 是 B1:B20
代码在这里:
Function FuzzyFind(lookup_value As String, tbl_array As Range) As String
Dim i As Integer, str As String, Value As String
Dim a As Integer, b As Integer, cell As Variant
For Each cell In tbl_array
str = cell
For i = 1 To Len(lookup_value)
If InStr(cell, Mid(lookup_value, i, 1)) > 0 Then
a = a + 1
cell = Mid(cell, 1, InStr(cell, Mid(lookup_value, i, 1)) - 1) & Mid(cell, InStr(cell, Mid(lookup_value, i, 1)) + 1, 9999)
End If
Next i
a = a - Len(cell)
If a > b Then
b = a
Value = str
End If
a = 0
Next cell
FuzzyFind = Value
End Function
这个函数的结果是命中注定的。谁能提高这个算法的智能?
谢谢:)
我不确定 "FuzzyFind" 到底包含什么,但这是一个使用 Levenshtein distance 查找相似数据的 VLOOKUP。
Levenshtein 距离让您可以 select 指定一个 "percentage match",而不是普通 VLOOKUP 中的典型 TRUE
或 FALSE
:
用法是:DTVLookup(A1,$C:$C0,1,90)
其中 90 是 Levenshtein 距离。
DTVLookup(Value To Find, Range to Search, Column to Return, [Percentage Match])
我通常在比较来自不同数据库的名称时使用它,例如:
Correct Name Example Lookup Percentage Match Other Report
John S Smith John Smith 83 John Smith
Barb Jones Barbara Jones 77 Barbara Jones
Jeffrey Bridge Jeff Bridge 79 Jeff Bridge
Joseph Park Joseph P. Park 79 Joseph P. Park
Jefrey Jones jefre jon 75 jefre jon
Peter Bridge peter f. bridge 80 peter f. bridge
代码如下:
Function DTVLookup(TheValue As Variant, TheRange As Range, TheColumn As Long, Optional PercentageMatch As Double = 100) As Variant
If TheColumn < 1 Then
DTVLookup = CVErr(xlErrValue)
Exit Function
End If
If TheColumn > TheRange.Columns.Count Then
DTVLookup = CVErr(xlErrRef)
Exit Function
End If
Dim c As Range
For Each c In TheRange.Columns(1).Cells
If UCase(TheValue) = UCase(c) Then
DTVLookup = c.Offset(0, TheColumn - 1)
Exit Function
ElseIf PercentageMatch <> 100 Then
If Levenshtein3(UCase(TheValue), UCase(c)) >= PercentageMatch Then
DTVLookup = c.Offset(0, TheColumn - 1)
Exit Function
End If
End If
Next c
DTVLookup = CVErr(xlErrNA)
End Function
Function Levenshtein3(ByVal string1 As String, ByVal string2 As String) As Long
Dim i As Long, j As Long, string1_length As Long, string2_length As Long
Dim distance(0 To 60, 0 To 50) As Long, smStr1(1 To 60) As Long, smStr2(1 To 50) As Long
Dim min1 As Long, min2 As Long, min3 As Long, minmin As Long, MaxL As Long
string1_length = Len(string1): string2_length = Len(string2)
distance(0, 0) = 0
For i = 1 To string1_length: distance(i, 0) = i: smStr1(i) = Asc(LCase(Mid$(string1, i, 1))): Next
For j = 1 To string2_length: distance(0, j) = j: smStr2(j) = Asc(LCase(Mid$(string2, j, 1))): Next
For i = 1 To string1_length
For j = 1 To string2_length
If smStr1(i) = smStr2(j) Then
distance(i, j) = distance(i - 1, j - 1)
Else
min1 = distance(i - 1, j) + 1
min2 = distance(i, j - 1) + 1
min3 = distance(i - 1, j - 1) + 1
If min2 < min1 Then
If min2 < min3 Then minmin = min2 Else minmin = min3
Else
If min1 < min3 Then minmin = min1 Else minmin = min3
End If
distance(i, j) = minmin
End If
Next
Next
' Levenshtein3 will properly return a percent match (100%=exact) based on similarities and Lengths etc...
MaxL = string1_length: If string2_length > MaxL Then MaxL = string2_length
Levenshtein3 = 100 - CLng((distance(string1_length, string2_length) * 100) / MaxL)
End Function
试试这个,我想它会找到最佳匹配
Function FuzzyFind2(lookup_value As String, tbl_array As Range) As String
Dim i As Integer, str As String, Value As String
Dim a As Integer, b As Integer, cell As Variant
Dim Found As Boolean
b = 0
For Each cell In tbl_array
str = cell
i = 1
Found = True
Do While Found = True
Found = False
If InStr(i, str, lookup_value) > 0 Then
a = a + 1
Found = True
i = InStr(i, str, lookup_value) + 1
End If
Loop
If a > b Then
b = a
Value = str
End If
a = 0
Next cell
FuzzyFind2 = Value
End Function
我一直在寻找这个主题,福尔摩斯四世的答案无疑是最好的。我只想添加一个小更新以始终以大写形式进行比较。对于我的问题,它向我推荐了更准确的选项。
Function FuzzyFind3(lookup_value As String, tbl_array As Range) As String
Dim i As Integer, str As String, Value As String
Dim a As Integer, b As Integer, cell As Variant
Dim Found As Boolean
b = 0
For Each cell In tbl_array
str = UCase(cell)
i = 1
Found = True
Do While Found = True
Found = False
If InStr(i, str, UCase(lookup_value)) > 0 Then
a = a + 1
Found = True
i = InStr(i, str, UCase(lookup_value)) + 1
End If
Loop
If a > b Then
b = a
Value = str
End If
a = 0
Next cell
FuzzyFind3 = Value
此功能可让您从一定范围内找到相似的字符串,而无需进行精确搜索。
公式如下所示:=FuzzyFind(A1,B$1:B$20) 假设您要搜索的字符串在 A1 中 并且您的参考或选项 table 是 B1:B20
代码在这里:
Function FuzzyFind(lookup_value As String, tbl_array As Range) As String
Dim i As Integer, str As String, Value As String
Dim a As Integer, b As Integer, cell As Variant
For Each cell In tbl_array
str = cell
For i = 1 To Len(lookup_value)
If InStr(cell, Mid(lookup_value, i, 1)) > 0 Then
a = a + 1
cell = Mid(cell, 1, InStr(cell, Mid(lookup_value, i, 1)) - 1) & Mid(cell, InStr(cell, Mid(lookup_value, i, 1)) + 1, 9999)
End If
Next i
a = a - Len(cell)
If a > b Then
b = a
Value = str
End If
a = 0
Next cell
FuzzyFind = Value
End Function
这个函数的结果是命中注定的。谁能提高这个算法的智能?
谢谢:)
我不确定 "FuzzyFind" 到底包含什么,但这是一个使用 Levenshtein distance 查找相似数据的 VLOOKUP。
Levenshtein 距离让您可以 select 指定一个 "percentage match",而不是普通 VLOOKUP 中的典型 TRUE
或 FALSE
:
用法是:DTVLookup(A1,$C:$C0,1,90)
其中 90 是 Levenshtein 距离。
DTVLookup(Value To Find, Range to Search, Column to Return, [Percentage Match])
我通常在比较来自不同数据库的名称时使用它,例如:
Correct Name Example Lookup Percentage Match Other Report
John S Smith John Smith 83 John Smith
Barb Jones Barbara Jones 77 Barbara Jones
Jeffrey Bridge Jeff Bridge 79 Jeff Bridge
Joseph Park Joseph P. Park 79 Joseph P. Park
Jefrey Jones jefre jon 75 jefre jon
Peter Bridge peter f. bridge 80 peter f. bridge
代码如下:
Function DTVLookup(TheValue As Variant, TheRange As Range, TheColumn As Long, Optional PercentageMatch As Double = 100) As Variant
If TheColumn < 1 Then
DTVLookup = CVErr(xlErrValue)
Exit Function
End If
If TheColumn > TheRange.Columns.Count Then
DTVLookup = CVErr(xlErrRef)
Exit Function
End If
Dim c As Range
For Each c In TheRange.Columns(1).Cells
If UCase(TheValue) = UCase(c) Then
DTVLookup = c.Offset(0, TheColumn - 1)
Exit Function
ElseIf PercentageMatch <> 100 Then
If Levenshtein3(UCase(TheValue), UCase(c)) >= PercentageMatch Then
DTVLookup = c.Offset(0, TheColumn - 1)
Exit Function
End If
End If
Next c
DTVLookup = CVErr(xlErrNA)
End Function
Function Levenshtein3(ByVal string1 As String, ByVal string2 As String) As Long
Dim i As Long, j As Long, string1_length As Long, string2_length As Long
Dim distance(0 To 60, 0 To 50) As Long, smStr1(1 To 60) As Long, smStr2(1 To 50) As Long
Dim min1 As Long, min2 As Long, min3 As Long, minmin As Long, MaxL As Long
string1_length = Len(string1): string2_length = Len(string2)
distance(0, 0) = 0
For i = 1 To string1_length: distance(i, 0) = i: smStr1(i) = Asc(LCase(Mid$(string1, i, 1))): Next
For j = 1 To string2_length: distance(0, j) = j: smStr2(j) = Asc(LCase(Mid$(string2, j, 1))): Next
For i = 1 To string1_length
For j = 1 To string2_length
If smStr1(i) = smStr2(j) Then
distance(i, j) = distance(i - 1, j - 1)
Else
min1 = distance(i - 1, j) + 1
min2 = distance(i, j - 1) + 1
min3 = distance(i - 1, j - 1) + 1
If min2 < min1 Then
If min2 < min3 Then minmin = min2 Else minmin = min3
Else
If min1 < min3 Then minmin = min1 Else minmin = min3
End If
distance(i, j) = minmin
End If
Next
Next
' Levenshtein3 will properly return a percent match (100%=exact) based on similarities and Lengths etc...
MaxL = string1_length: If string2_length > MaxL Then MaxL = string2_length
Levenshtein3 = 100 - CLng((distance(string1_length, string2_length) * 100) / MaxL)
End Function
试试这个,我想它会找到最佳匹配
Function FuzzyFind2(lookup_value As String, tbl_array As Range) As String
Dim i As Integer, str As String, Value As String
Dim a As Integer, b As Integer, cell As Variant
Dim Found As Boolean
b = 0
For Each cell In tbl_array
str = cell
i = 1
Found = True
Do While Found = True
Found = False
If InStr(i, str, lookup_value) > 0 Then
a = a + 1
Found = True
i = InStr(i, str, lookup_value) + 1
End If
Loop
If a > b Then
b = a
Value = str
End If
a = 0
Next cell
FuzzyFind2 = Value
End Function
我一直在寻找这个主题,福尔摩斯四世的答案无疑是最好的。我只想添加一个小更新以始终以大写形式进行比较。对于我的问题,它向我推荐了更准确的选项。
Function FuzzyFind3(lookup_value As String, tbl_array As Range) As String
Dim i As Integer, str As String, Value As String
Dim a As Integer, b As Integer, cell As Variant
Dim Found As Boolean
b = 0
For Each cell In tbl_array
str = UCase(cell)
i = 1
Found = True
Do While Found = True
Found = False
If InStr(i, str, UCase(lookup_value)) > 0 Then
a = a + 1
Found = True
i = InStr(i, str, UCase(lookup_value)) + 1
End If
Loop
If a > b Then
b = a
Value = str
End If
a = 0
Next cell
FuzzyFind3 = Value