我想检查 excel 中相应单元格中文本的相似性

I want to check for similarity for text in corresponding cells in excel

我的excel有一个数据集sheet,每个单元格中的数据是一组用“;”分隔的数字。

预期结果在第二个单元格中,所有四个数字都在两列(G 和 H)中,但顺序不同。在下一行中,顺序相同。那么有什么方法可以检查相似性

我试过使用下面的代码,但它似乎只突出显示(红色)前几个字符

If anyone wants to see the file then click here

密码是:

Sub highlight()
    Dim xRg1 As Range
    Dim xRg2 As Range
    Dim xTxt As String
    Dim xCell1 As Range
    Dim xCell2 As Range
    Dim I As Long
    Dim J As Integer
    Dim xLen As Integer
    Dim xDiffs As Boolean
    On Error Resume Next
    If ActiveWindow.RangeSelection.Count > 1 Then
      xTxt = ActiveWindow.RangeSelection.AddressLocal
    Else
      xTxt = ActiveSheet.UsedRange.AddressLocal
    End If
lOne:
    Set xRg1 = Application.InputBox("Range A:", "Similarity finder", xTxt, , , , , 8)
    If xRg1 Is Nothing Then Exit Sub
    If xRg1.Columns.Count > 1 Or xRg1.Areas.Count > 1 Then
        MsgBox "Multiple ranges or columns have been selected ", vbInformation, "Similarity finder"
        GoTo lOne
    End If
lTwo:
    Set xRg2 = Application.InputBox("Range B:", "Similarity finder", "", , , , , 8)
    If xRg2 Is Nothing Then Exit Sub
    If xRg2.Columns.Count > 1 Or xRg2.Areas.Count > 1 Then
        MsgBox "Multiple ranges or columns have been selected ", vbInformation, "Similarity finder"
        GoTo lTwo
    End If
    If xRg1.CountLarge <> xRg2.CountLarge Then
       MsgBox "Two selected ranges must have the same numbers of cells ", vbInformation, "Similarity finder"
       GoTo lTwo
    End If
    xDiffs = (MsgBox("Click Yes to highlight similarities, click No to highlight differences ", vbYesNo + vbQuestion, "Similarity finder") = vbNo)
    Application.ScreenUpdating = False
    xRg2.Font.ColorIndex = xlAutomatic
    For I = 1 To xRg1.Count
        Set xCell1 = xRg1.Cells(I)
        Set xCell2 = xRg2.Cells(I)
        If xCell1.Value2 = xCell2.Value2 Then
            If Not xDiffs Then xCell2.Font.Color = vbRed
        Else
            xLen = Len(xCell1.Value2)
            For J = 1 To xLen
                If Not xCell1.Characters(J, 1).Text = xCell2.Characters(J, 1).Text Then Exit For
            Next J
            If Not xDiffs Then
                If J <= Len(xCell2.Value2) And J > 1 Then
                    xCell2.Characters(1, J - 1).Font.Color = vbRed
                End If
            Else
                If J <= Len(xCell2.Value2) Then
                    xCell2.Characters(J, Len(xCell2.Value2) - J + 1).Font.Color = vbRed
                End If
            End If
        End If
    Next
    Application.ScreenUpdating = True
End Sub

一种方法是使用第一列创建正则表达式并将其应用于第二列。

使用正则表达式的一个优点是返回的数据之一是匹配的开始和长度——非常适合寻址范围对象的字符 属性。

我使用了早期绑定(参见代码注释中要设置的参考),但如果必须,您可以使用后期绑定。

我在 A 列和 B 列中也有数据,但您可以使用定义数据位置的代码部分更改它。

您不需要使用任何 On Error 代码。编写代码来处理任何可预见的错误要好得多。我没有做任何错误检查,可能需要添加。

如果速度有问题,可以进行各种其他修改。

构建的正则表达式将具有

的一般外观

\b(?:nnn|nnn|nnn|nnn)\b

这意味着

  • 匹配单词边界
  • 匹配任何竖线分隔的子字符串
  • 匹配另一个单词边界。

有关详细信息,请参阅 如何在 Microsoft Excel 单元格和循环中使用正则表达式 (Regex)

Option Explicit
'Set reference to Microsoft VBScript Regular Expressions 5.5
Sub highLight()
    Dim R As Range, C  As Range, WS As Worksheet
    Dim RE As RegExp, MC As MatchCollection, M As Match
    Dim sSplit As String
    
'set the data range
'   one column wide
'   column 2 will be offset 1 to the left
'Obviously you can change this in many ways
'And also work in your user selected method as in your code.
'only requirement is that the ranges be single column, and you can
'easily check for that
Set WS = ThisWorkbook.Worksheets("Sheet4")
With WS
    Set R = Range(.Cells(1, 1), .Cells(.Rows.Count, 1).End(xlUp))
End With

Set RE = New RegExp
Application.ScreenUpdating = False
With RE
    .Global = True
    
    'loop through the first column
        For Each C In R.Rows
            'replace the semicolon with the pipe
            sSplit = Replace(Join(Split(C.Value, ";"), "|"), " ", "")
                
                'since data has a terminal semi-colon, need to remove it if present
                If Right(sSplit, 1) = "|" Then sSplit = Left(sSplit, Len(sSplit) - 1)
            
            'finish construction of the regex pattern
            .Pattern = "\b(?:" & sSplit & ")\b"
            
            'check for matches and change relevant characters font color
            Set MC = .Execute(C.Offset(columnoffset:=1))
            With C.Offset(-0, 1)
                .Font.Color = vbBlack
                For Each M In MC
                    .Characters(M.FirstIndex + 1, M.Length).Font.Color = vbRed
                Next M
            End With
        Next C
End With
End Sub

相似之处

  • 您的数据存在问题:第 A 列中的值以“;”结尾,而第 B 列中的值以“;”结尾。提出的解决方案是仅将最后一个分号 ; 之前的字符串与 Split 函数一起使用。
  • 为避免复杂化,我将代码集成到您相当酷的解决方案中。

代码

Option Explicit

Sub highlightOrig()
    Dim xRg1 As Range
    Dim xRg2 As Range
    Dim xTxt As String
    Dim xCell1 As Range
    Dim xCell2 As Range
    Dim I As Long
    Dim J As Integer
    Dim xLen As Integer
    Dim xDiffs As Boolean
    On Error Resume Next
    If ActiveWindow.RangeSelection.Count > 1 Then
      xTxt = ActiveWindow.RangeSelection.AddressLocal
    Else
      xTxt = ActiveSheet.UsedRange.AddressLocal
    End If
lOne:
    Set xRg1 = Application.InputBox("Range A:", "Similarity finder", xTxt, , , , , 8)
    If xRg1 Is Nothing Then Exit Sub
    If xRg1.Columns.Count > 1 Or xRg1.Areas.Count > 1 Then
        MsgBox "Multiple ranges or columns have been selected ", vbInformation, "Similarity finder"
        GoTo lOne
    End If
lTwo:
    Set xRg2 = Application.InputBox("Range B:", "Similarity finder", "", , , , , 8)
    If xRg2 Is Nothing Then Exit Sub
    If xRg2.Columns.Count > 1 Or xRg2.Areas.Count > 1 Then
        MsgBox "Multiple ranges or columns have been selected ", vbInformation, "Similarity finder"
        GoTo lTwo
    End If
    If xRg1.CountLarge <> xRg2.CountLarge Then
       MsgBox "Two selected ranges must have the same numbers of cells ", vbInformation, "Similarity finder"
       GoTo lTwo
    End If
    xDiffs = (MsgBox("Click Yes to highlight similarities, click No to highlight differences ", vbYesNo + vbQuestion, "Similarity finder") = vbNo)
    Application.ScreenUpdating = False
    
    xRg1.Font.ColorIndex = xlAutomatic
    xRg2.Font.ColorIndex = xlAutomatic
    
    Const Delimiter As String = "; "
    Dim dLen As Long: dLen = Len(Delimiter)
    Dim s() As String
    Dim d() As String
    Dim cString As String
    Dim n As Long
    Dim cStart As Long
    
    If Not xDiffs Then
        For I = 1 To xRg1.Count
            cStart = 1
            Set xCell1 = xRg1.Cells(I)
            Set xCell2 = xRg2.Cells(I)
            cString = Left(xCell1.Value, InStrRev(xCell1.Value, ";") - 1)
            s = Split(cString, Delimiter)
            cString = Left(xCell2.Value, InStrRev(xCell2.Value, ";") - 1)
            d = Split(cString, Delimiter)
            For n = 0 To UBound(d)
                If IsNumeric(Application.Match(d(n), s, 0)) Then
                    xCell2.Characters(cStart, Len(d(n))).Font.Color = vbRed
                End If
                cStart = cStart + Len(d(n)) + dLen
            Next n
        Next I
    Else
        For I = 1 To xRg1.Count
            cStart = 1
            Set xCell1 = xRg1.Cells(I)
            Set xCell2 = xRg2.Cells(I)
            cString = Left(xCell1.Value, InStrRev(xCell1.Value, ";") - 1)
            s = Split(cString, Delimiter)
            cString = Left(xCell2.Value, InStrRev(xCell2.Value, ";") - 1)
            d = Split(cString, Delimiter)
            For n = 0 To UBound(d)
                If IsError(Application.Match(d(n), s, 0)) Then
                    xCell2.Characters(cStart, Len(d(n))).Font.Color = vbRed
                End If
                cStart = cStart + Len(d(n)) + dLen
            Next n
        Next I
    End If
    
    Application.ScreenUpdating = True

End Sub

你的问题的核心是如何测试你的相似性概念。

这是一个函数,当给定一个由分号分隔的项目组成的字符串时,returns True 如果项目相同,尽管可能顺序不同,并且 False 否则。关键思想是在 ";" 上拆分,对结果数组进行排序,然后重新加入。这会将字符串置于标准排序顺序中,然后可以直接比较它们是否相等:

Function Sort(A As Variant) As Variant
    Dim sorted As Variant, item As Variant
    Dim lb As Long, ub As Long
    Dim i As Long
    Dim items As Object
    Set items = CreateObject("System.Collections.ArrayList")
    
    lb = LBound(A)
    ub = UBound(A)
    ReDim sorted(lb To ub)
    
    For i = lb To ub
        item = A(i) 'Somewhat oddly, seems needed
        items.Add item
    Next i
    
    items.Sort
    
    For i = lb To ub
        sorted(i) = items(i - lb)
    Next i
    
    Sort = sorted
End Function

Function Similar(s1 As String, s2 As String) As Boolean
    Similar = (Join(Sort(Split(s1, ";")), ";") = Join(Sort(Split(s2, ";")), ";"))
End Function

例如,Similar("12;34;56","34;56;12") 的计算结果为 True,而 Similar("12;34;56","34;16;12") 的计算结果为 False

在您的情况下,您有尾随分号,因此它们不会以标准方式用作分隔符。事实证明这无关紧要:Similar("12;34;56;","34;56;12;") 仍然按预期工作。