匹配两个不同 sheet 中两列的部分文本字符串 (90%)
Match partial text string (90%) two column in two different sheet
我正在尝试将 sheet 列中的 (90%) 部分文本字符串匹配到另一个 sheet 列,并将最终结果提供给主 sheet 列。
我找到了 VBA 解决方案,但我遇到了一些问题。
1)它匹配准确的文本
2) 找到一个问题来匹配两个不同的 sheet 列。
请帮我解决这个问题。
Sub lookup()
Dim TotalRows As Long
Dim rng As Range
Dim i As Long
'Copy lookup values from sheet1 to sheet3
Sheets("BANK STATEMENT ENTRY").Select
TotalRows = ActiveSheet.UsedRange.Rows.Count
Range("F3:F" & TotalRows).Copy Destination:=Sheets("TEST").Range("A1")
'Go to the destination sheet
Sheets("TEST").Select
For i = 1 To TotalRows
'Search for the value on sheet2
Set rng = Sheets("INFO").UsedRange.Find(Cells(i, 1).Value)
'If it is found put its value on the destination sheet
If Not rng Is Nothing Then
Cells(i, 2).Value = rng.Value
End If
Next
End Sub
我做过一个文本挖掘项目,我知道你不能使用那种方法,你必须将字符串分解成子字符串,然后分析它们。这将是一个完整的项目,但你很幸运,因为我为你做了。
让我们简化问题,假设您有两个范围的字符串,并且您想要找到两组之间所有相似的字符串。此外,您希望有一个容差来最小化匹配对。
假设 ABCDE 和 12BCD00。它们的共同点有 B、C、D、BC、CD 和 BCD。所以最长的公共子串是 BCD,它是 3 个字符:3/ABCDE(5) 的长度将与第一个字符串有 60% 的相似度,3/7=43% 的相似度。因此,如果您可以获得两个范围内所有字符串中所有这些公共子字符串的列表,您可以想出一个更好的列表来过滤并获得您想要的内容。
我写了一堆函数。要轻松使用它,只需将两组字符串复制并粘贴到一个 sheet 中,然后在同一 sheet 上生成最终报告,以了解其工作原理。
Function FuzzyFind,查找所有公共子字符串,并为您提供来自 Group1/range1 的第一个字符串、来自 group2/range2 的第二个字符串、公共子字符串和两个字符串的相似度百分比。好处是你可以告诉函数你想要你的子串有多小,例如在前面的例子中,如果你说iMinCommonSubLength=3,它只会给你BCD,如果你说iMinCommonSubLength=2,它会给你BC,CD和BCD等等。
使用函数 Main。我还包括了一个测试子。
函数:
Sub TestIt()
Call Main(ActiveSheet.Range("A1:A10"), ActiveSheet.Range("B1:B10"), 4, ActiveSheet.Range("D1"))
End Sub
Sub Main(rng1 As Range, rng2 As Range, iMinCommonSubLength As Integer, Optional rngReportUpperLeftCell As Range)
Dim arr() As Variant
Dim rngReport As Range
If rngReport Is Nothing Then Set rngReport = ActiveSheet.Range("A1")
arr = FuzzyFind(rng1, rng2, iMinCommonSubLength)
Set rngReport = rngReportUpperLeftCell.Resize(UBound(arr, 1), UBound(arr, 2))
rngReport.Value = arr
rngReport.Columns(1).NumberFormat = "@"
rngReport.Columns(2).NumberFormat = "@"
rngReport.Columns(3).NumberFormat = "@"
rngReport.Columns(4).NumberFormat = "0%"
rngReport.Columns(5).NumberFormat = "0%"
End Sub
Function GetCharacters(str As String) As Variant
Dim arr() As String
ReDim arr(Len(str) - 1)
For i = 1 To Len(str)
arr(i - 1) = Mid$(UCase(str), i, 1)
Next
GetCharacters = arr
End Function
Function GetIterations(iStringLength As Integer, iSubStringLength As Integer) As Integer
If iStringLength >= iSubStringLength Then
GetIterations = iStringLength - iSubStringLength + 1
Else
GetIterations = 0
End If
End Function
Function GetSubtrings(str As String, iSubLength As Integer) As Variant
Dim i As Integer
Dim count As Integer
Dim arr() As Variant
count = GetIterations(Len(str), iSubLength)
ReDim arr(1 To count)
For i = 1 To count
arr(i) = Mid(str, i, iSubLength)
Next i
GetSubtrings = arr()
End Function
Function GetLongestCommonSubStrings(str1 As String, str2 As String, iMinCommonSubLeng As Integer)
Dim i As Integer
Dim iLongestPossible As Integer
Dim iShortest As Integer
Dim arrSubs() As Variant
Dim arr1() As Variant
Dim arr2() As Variant
ReDim arrSubs(1 To 1)
'Longest possible common substring length is the smaller string's length
iLongestPossible = IIf(Len(str1) > Len(str2), Len(str2), Len(str1))
If iLongestPossible < iMinCommonSubLeng Then
'MsgBox "Minimum common substring length is larger than the shortest string." & _
' " You have to choose a smaller common length", , "Error"
Else
'We will try to find the first match of common substrings of two given strings, exit after the first match
For i = iLongestPossible To iMinCommonSubLeng Step -1
arr1 = GetSubtrings(str1, i)
arr2 = GetSubtrings(str2, i)
ReDim arrSubs(1 To 1)
arrSubs = GetCommonElement(arr1, arr2)
If arrSubs(1) <> "" Then Exit For 'if you want JUST THE LONGEST MATCH, comment out this line
Next i
End If
GetLongestCommonSubStrings = arrSubs
End Function
Function GetCommonElement(arr1() As Variant, arr2() As Variant) As Variant
Dim i As Integer
Dim j As Integer
Dim count As Integer
Dim arr() As Variant
count = 1
ReDim arr(1 To count)
For i = 1 To UBound(arr1)
For j = 1 To UBound(arr2)
If arr1(i) = arr2(j) Then
ReDim Preserve arr(1 To count)
arr(count) = arr1(i)
count = count + 1
End If
Next j
Next i
GetCommonElement = arr
End Function
Function FuzzyFind(rng1 As Range, rng2 As Range, iMinCommonSubLength As Integer) As Variant
Dim count As Integer
Dim i As Integer
Dim arrSubs As Variant
Dim str1 As String
Dim str2 As String
Dim cell1 As Range
Dim cell2 As Range
Dim rngReport As Range
Dim arr() As Variant 'array of all cells that are partially matching, str1, str2, common string, percentage
count = 1
ReDim arr(1 To 5, 1 To count)
For Each cell1 In rng1
str1 = UCase(CStr(cell1.Value))
If str1 <> "" Then
For Each cell2 In rng2
str2 = UCase(CStr(cell2.Value))
If str2 <> "" Then
ReDim arrSubs(1 To 1)
arrSubs = GetLongestCommonSubStrings(str1, str2, iMinCommonSubLength)
If arrSubs(1) <> "" Then
For i = 1 To UBound(arrSubs)
arr(1, count) = cell1.Value
arr(2, count) = cell2.Value
arr(3, count) = arrSubs(i)
arr(4, count) = Len(arrSubs(i)) / Len(str1)
arr(5, count) = Len(arrSubs(i)) / Len(str2)
count = count + 1
ReDim Preserve arr(1 To 5, 1 To count)
Next i
End If
End If
Next cell2
End If
Next cell1
FuzzyFind = TransposeArray(arr)
End Function
Function TransposeArray(arr As Variant) As Variant
Dim arrTemp() As Variant
ReDim arrTemp(LBound(arr, 2) To UBound(arr, 2), LBound(arr, 1) To UBound(arr, 1))
For a = LBound(arr, 2) To UBound(arr, 2)
For b = LBound(arr, 1) To UBound(arr, 1)
arrTemp(a, b) = arr(b, a)
Next b
Next a
TransposeArray = arrTemp
End Function
不要忘记在生成新报告之前清除 sheet。插入 table 并使用其自动过滤器轻松过滤您的内容。
最后但同样重要的是,不要忘记单击复选标记宣布这是您问题的答案。
我正在尝试将 sheet 列中的 (90%) 部分文本字符串匹配到另一个 sheet 列,并将最终结果提供给主 sheet 列。 我找到了 VBA 解决方案,但我遇到了一些问题。 1)它匹配准确的文本 2) 找到一个问题来匹配两个不同的 sheet 列。
请帮我解决这个问题。
Sub lookup()
Dim TotalRows As Long
Dim rng As Range
Dim i As Long
'Copy lookup values from sheet1 to sheet3
Sheets("BANK STATEMENT ENTRY").Select
TotalRows = ActiveSheet.UsedRange.Rows.Count
Range("F3:F" & TotalRows).Copy Destination:=Sheets("TEST").Range("A1")
'Go to the destination sheet
Sheets("TEST").Select
For i = 1 To TotalRows
'Search for the value on sheet2
Set rng = Sheets("INFO").UsedRange.Find(Cells(i, 1).Value)
'If it is found put its value on the destination sheet
If Not rng Is Nothing Then
Cells(i, 2).Value = rng.Value
End If
Next
End Sub
我做过一个文本挖掘项目,我知道你不能使用那种方法,你必须将字符串分解成子字符串,然后分析它们。这将是一个完整的项目,但你很幸运,因为我为你做了。
让我们简化问题,假设您有两个范围的字符串,并且您想要找到两组之间所有相似的字符串。此外,您希望有一个容差来最小化匹配对。
假设 ABCDE 和 12BCD00。它们的共同点有 B、C、D、BC、CD 和 BCD。所以最长的公共子串是 BCD,它是 3 个字符:3/ABCDE(5) 的长度将与第一个字符串有 60% 的相似度,3/7=43% 的相似度。因此,如果您可以获得两个范围内所有字符串中所有这些公共子字符串的列表,您可以想出一个更好的列表来过滤并获得您想要的内容。
我写了一堆函数。要轻松使用它,只需将两组字符串复制并粘贴到一个 sheet 中,然后在同一 sheet 上生成最终报告,以了解其工作原理。
Function FuzzyFind,查找所有公共子字符串,并为您提供来自 Group1/range1 的第一个字符串、来自 group2/range2 的第二个字符串、公共子字符串和两个字符串的相似度百分比。好处是你可以告诉函数你想要你的子串有多小,例如在前面的例子中,如果你说iMinCommonSubLength=3,它只会给你BCD,如果你说iMinCommonSubLength=2,它会给你BC,CD和BCD等等。
使用函数 Main。我还包括了一个测试子。
函数:
Sub TestIt()
Call Main(ActiveSheet.Range("A1:A10"), ActiveSheet.Range("B1:B10"), 4, ActiveSheet.Range("D1"))
End Sub
Sub Main(rng1 As Range, rng2 As Range, iMinCommonSubLength As Integer, Optional rngReportUpperLeftCell As Range)
Dim arr() As Variant
Dim rngReport As Range
If rngReport Is Nothing Then Set rngReport = ActiveSheet.Range("A1")
arr = FuzzyFind(rng1, rng2, iMinCommonSubLength)
Set rngReport = rngReportUpperLeftCell.Resize(UBound(arr, 1), UBound(arr, 2))
rngReport.Value = arr
rngReport.Columns(1).NumberFormat = "@"
rngReport.Columns(2).NumberFormat = "@"
rngReport.Columns(3).NumberFormat = "@"
rngReport.Columns(4).NumberFormat = "0%"
rngReport.Columns(5).NumberFormat = "0%"
End Sub
Function GetCharacters(str As String) As Variant
Dim arr() As String
ReDim arr(Len(str) - 1)
For i = 1 To Len(str)
arr(i - 1) = Mid$(UCase(str), i, 1)
Next
GetCharacters = arr
End Function
Function GetIterations(iStringLength As Integer, iSubStringLength As Integer) As Integer
If iStringLength >= iSubStringLength Then
GetIterations = iStringLength - iSubStringLength + 1
Else
GetIterations = 0
End If
End Function
Function GetSubtrings(str As String, iSubLength As Integer) As Variant
Dim i As Integer
Dim count As Integer
Dim arr() As Variant
count = GetIterations(Len(str), iSubLength)
ReDim arr(1 To count)
For i = 1 To count
arr(i) = Mid(str, i, iSubLength)
Next i
GetSubtrings = arr()
End Function
Function GetLongestCommonSubStrings(str1 As String, str2 As String, iMinCommonSubLeng As Integer)
Dim i As Integer
Dim iLongestPossible As Integer
Dim iShortest As Integer
Dim arrSubs() As Variant
Dim arr1() As Variant
Dim arr2() As Variant
ReDim arrSubs(1 To 1)
'Longest possible common substring length is the smaller string's length
iLongestPossible = IIf(Len(str1) > Len(str2), Len(str2), Len(str1))
If iLongestPossible < iMinCommonSubLeng Then
'MsgBox "Minimum common substring length is larger than the shortest string." & _
' " You have to choose a smaller common length", , "Error"
Else
'We will try to find the first match of common substrings of two given strings, exit after the first match
For i = iLongestPossible To iMinCommonSubLeng Step -1
arr1 = GetSubtrings(str1, i)
arr2 = GetSubtrings(str2, i)
ReDim arrSubs(1 To 1)
arrSubs = GetCommonElement(arr1, arr2)
If arrSubs(1) <> "" Then Exit For 'if you want JUST THE LONGEST MATCH, comment out this line
Next i
End If
GetLongestCommonSubStrings = arrSubs
End Function
Function GetCommonElement(arr1() As Variant, arr2() As Variant) As Variant
Dim i As Integer
Dim j As Integer
Dim count As Integer
Dim arr() As Variant
count = 1
ReDim arr(1 To count)
For i = 1 To UBound(arr1)
For j = 1 To UBound(arr2)
If arr1(i) = arr2(j) Then
ReDim Preserve arr(1 To count)
arr(count) = arr1(i)
count = count + 1
End If
Next j
Next i
GetCommonElement = arr
End Function
Function FuzzyFind(rng1 As Range, rng2 As Range, iMinCommonSubLength As Integer) As Variant
Dim count As Integer
Dim i As Integer
Dim arrSubs As Variant
Dim str1 As String
Dim str2 As String
Dim cell1 As Range
Dim cell2 As Range
Dim rngReport As Range
Dim arr() As Variant 'array of all cells that are partially matching, str1, str2, common string, percentage
count = 1
ReDim arr(1 To 5, 1 To count)
For Each cell1 In rng1
str1 = UCase(CStr(cell1.Value))
If str1 <> "" Then
For Each cell2 In rng2
str2 = UCase(CStr(cell2.Value))
If str2 <> "" Then
ReDim arrSubs(1 To 1)
arrSubs = GetLongestCommonSubStrings(str1, str2, iMinCommonSubLength)
If arrSubs(1) <> "" Then
For i = 1 To UBound(arrSubs)
arr(1, count) = cell1.Value
arr(2, count) = cell2.Value
arr(3, count) = arrSubs(i)
arr(4, count) = Len(arrSubs(i)) / Len(str1)
arr(5, count) = Len(arrSubs(i)) / Len(str2)
count = count + 1
ReDim Preserve arr(1 To 5, 1 To count)
Next i
End If
End If
Next cell2
End If
Next cell1
FuzzyFind = TransposeArray(arr)
End Function
Function TransposeArray(arr As Variant) As Variant
Dim arrTemp() As Variant
ReDim arrTemp(LBound(arr, 2) To UBound(arr, 2), LBound(arr, 1) To UBound(arr, 1))
For a = LBound(arr, 2) To UBound(arr, 2)
For b = LBound(arr, 1) To UBound(arr, 1)
arrTemp(a, b) = arr(b, a)
Next b
Next a
TransposeArray = arrTemp
End Function
不要忘记在生成新报告之前清除 sheet。插入 table 并使用其自动过滤器轻松过滤您的内容。
最后但同样重要的是,不要忘记单击复选标记宣布这是您问题的答案。