匹配两个不同 sheet 中两列的部分文本字符串 (90%)

Match partial text string (90%) two column in two different sheet

我正在尝试将 sheet 列中的 (90%) 部分文本字符串匹配到另一个 sheet 列,并将最终结果提供给主 sheet 列。 我找到了 VBA 解决方案,但我遇到了一些问题。 1)它匹配准确的文本 2) 找到一个问题来匹配两个不同的 sheet 列。

请帮我解决这个问题。

Sub lookup()
Dim TotalRows As Long
Dim rng As Range
Dim i As Long

'Copy lookup values from sheet1 to sheet3
Sheets("BANK STATEMENT ENTRY").Select
TotalRows = ActiveSheet.UsedRange.Rows.Count
Range("F3:F" & TotalRows).Copy Destination:=Sheets("TEST").Range("A1")

'Go to the destination sheet
Sheets("TEST").Select

For i = 1 To TotalRows
    'Search for the value on sheet2
    Set rng = Sheets("INFO").UsedRange.Find(Cells(i, 1).Value)
    'If it is found put its value on the destination sheet
    If Not rng Is Nothing Then
        Cells(i, 2).Value = rng.Value
    End If
Next
End Sub

我做过一个文本挖掘项目,我知道你不能使用那种方法,你必须将字符串分解成子字符串,然后分析它们。这将是一个完整的项目,但你很幸运,因为我为你做了。

让我们简化问题,假设您有两个范围的字符串,并且您想要找到两组之间所有相似的字符串。此外,您希望有一个容差来最小化匹配对。

假设 ABCDE 和 12BCD00。它们的共同点有 B、C、D、BC、CD 和 BCD。所以最长的公共子串是 BCD,它是 3 个字符:3/ABCDE(5) 的长度将与第一个字符串有 60% 的相似度,3/7=43% 的相似度。因此,如果您可以获得两个范围内所有字符串中所有这些公共子字符串的列表,您可以想出一个更好的列表来过滤并获得您想要的内容。

我写了一堆函数。要轻松使用它,只需将两组字符串复制并粘贴到一个 sheet 中,然后在同一 sheet 上生成最终报告,以了解其工作原理。

Function FuzzyFind,查找所有公共子字符串,并为您提供来自 Group1/range1 的第一个字符串、来自 group2/range2 的第二个字符串、公共子字符串和两个字符串的相似度百分比。好处是你可以告诉函数你想要你的子串有多小,例如在前面的例子中,如果你说iMinCommonSubLength=3,它只会给你BCD,如果你说iMinCommonSubLength=2,它会给你BC,CD和BCD等等。

使用函数 Main。我还包括了一个测试子。

函数:

Sub TestIt()
    Call Main(ActiveSheet.Range("A1:A10"), ActiveSheet.Range("B1:B10"), 4, ActiveSheet.Range("D1"))
End Sub

Sub Main(rng1 As Range, rng2 As Range, iMinCommonSubLength As Integer, Optional rngReportUpperLeftCell As Range)
    Dim arr() As Variant
    Dim rngReport As Range

    If rngReport Is Nothing Then Set rngReport = ActiveSheet.Range("A1")

    arr = FuzzyFind(rng1, rng2, iMinCommonSubLength)
    Set rngReport = rngReportUpperLeftCell.Resize(UBound(arr, 1), UBound(arr, 2))

    rngReport.Value = arr
    rngReport.Columns(1).NumberFormat = "@"
    rngReport.Columns(2).NumberFormat = "@"
    rngReport.Columns(3).NumberFormat = "@"
    rngReport.Columns(4).NumberFormat = "0%"
    rngReport.Columns(5).NumberFormat = "0%"
End Sub

Function GetCharacters(str As String) As Variant
    Dim arr() As String
    ReDim arr(Len(str) - 1)
    For i = 1 To Len(str)
        arr(i - 1) = Mid$(UCase(str), i, 1)
    Next
    GetCharacters = arr
End Function


Function GetIterations(iStringLength As Integer, iSubStringLength As Integer) As Integer

    If iStringLength >= iSubStringLength Then
        GetIterations = iStringLength - iSubStringLength + 1
    Else
        GetIterations = 0
    End If
End Function


Function GetSubtrings(str As String, iSubLength As Integer) As Variant
    Dim i As Integer
    Dim count As Integer
    Dim arr() As Variant

    count = GetIterations(Len(str), iSubLength)
    ReDim arr(1 To count)

    For i = 1 To count
        arr(i) = Mid(str, i, iSubLength)
    Next i

    GetSubtrings = arr()
End Function

Function GetLongestCommonSubStrings(str1 As String, str2 As String, iMinCommonSubLeng As Integer)
    Dim i As Integer
    Dim iLongestPossible As Integer
    Dim iShortest  As Integer

    Dim arrSubs() As Variant

    Dim arr1() As Variant
    Dim arr2() As Variant

    ReDim arrSubs(1 To 1)

    'Longest possible common substring length is the smaller string's length
    iLongestPossible = IIf(Len(str1) > Len(str2), Len(str2), Len(str1))

    If iLongestPossible < iMinCommonSubLeng Then
        'MsgBox "Minimum common substring length is larger than the shortest string." & _
        '   " You have to choose a smaller common length", , "Error"
    Else
        'We will try to find the first match of common substrings of two given strings, exit after the first match
        For i = iLongestPossible To iMinCommonSubLeng Step -1
            arr1 = GetSubtrings(str1, i)
            arr2 = GetSubtrings(str2, i)
            ReDim arrSubs(1 To 1)
            arrSubs = GetCommonElement(arr1, arr2)

            If arrSubs(1) <> "" Then Exit For 'if you want JUST THE LONGEST MATCH, comment out this line
        Next i
    End If

    GetLongestCommonSubStrings = arrSubs
End Function

Function GetCommonElement(arr1() As Variant, arr2() As Variant) As Variant
    Dim i As Integer
    Dim j As Integer
    Dim count As Integer
    Dim arr() As Variant

    count = 1
    ReDim arr(1 To count)

    For i = 1 To UBound(arr1)
        For j = 1 To UBound(arr2)
            If arr1(i) = arr2(j) Then
                ReDim Preserve arr(1 To count)
                arr(count) = arr1(i)
                count = count + 1
            End If
        Next j
    Next i

    GetCommonElement = arr
End Function

Function FuzzyFind(rng1 As Range, rng2 As Range, iMinCommonSubLength As Integer) As Variant
    Dim count As Integer
    Dim i As Integer
    Dim arrSubs As Variant
    Dim str1 As String
    Dim str2 As String
    Dim cell1 As Range
    Dim cell2 As Range
    Dim rngReport  As Range
    Dim arr() As Variant 'array of all cells that are partially matching, str1, str2, common string, percentage

    count = 1
    ReDim arr(1 To 5, 1 To count)

    For Each cell1 In rng1
        str1 = UCase(CStr(cell1.Value))
        If str1 <> "" Then
            For Each cell2 In rng2
                str2 = UCase(CStr(cell2.Value))
                If str2 <> "" Then
                    ReDim arrSubs(1 To 1)
                    arrSubs = GetLongestCommonSubStrings(str1, str2, iMinCommonSubLength)
                    If arrSubs(1) <> "" Then
                        For i = 1 To UBound(arrSubs)
                            arr(1, count) = cell1.Value
                            arr(2, count) = cell2.Value
                            arr(3, count) = arrSubs(i)
                            arr(4, count) = Len(arrSubs(i)) / Len(str1)
                            arr(5, count) = Len(arrSubs(i)) / Len(str2)
                            count = count + 1
                            ReDim Preserve arr(1 To 5, 1 To count)
                        Next i
                    End If
                End If
            Next cell2
        End If
    Next cell1

    FuzzyFind = TransposeArray(arr)

End Function


Function TransposeArray(arr As Variant) As Variant
   Dim arrTemp() As Variant
   ReDim arrTemp(LBound(arr, 2) To UBound(arr, 2), LBound(arr, 1) To UBound(arr, 1))
   For a = LBound(arr, 2) To UBound(arr, 2)
     For b = LBound(arr, 1) To UBound(arr, 1)
        arrTemp(a, b) = arr(b, a)
     Next b
   Next a
   TransposeArray = arrTemp
End Function    

不要忘记在生成新报告之前清除 sheet。插入 table 并使用其自动过滤器轻松过滤您的内容。

最后但同样重要的是,不要忘记单击复选标记宣布这是您问题的答案。