估计 Excel 中行之间的重复百分比

Estimating duplication percentage between rows in Excel

我有一个 Excel (2010) 数据文件,其中包含 200 多个变量(列)和 1,000 多个记录(行),每个记录都有一个唯一的 ID 号。然而,我怀疑其中一些记录是伪造的,即有人拿了一个现有的记录,复制了它,只改变了几个数字,使它有点不同。因此,我需要生成一个矩阵,显示每条记录和所有其他记录之间 "same values" 的 number/percent(例如,记录 1 和记录 2 共享 75 个相等值,记录 1 和记录 3 共享57 个等值,记录 2 和记录 3 共享 45 个等值等)。我有一些解决方法,但它们需要数小时并且不会生成简单的矩阵。我不关心值之间的差异——只关心它们是否相等。任何想法将不胜感激!

不知道这在大型数据集上的表现如何但是:

Sub T()

    Dim d, m(), nR As Long, nC As Long, r As Long, r2 As Long, c As Long
    Dim v1, v2, i As Long
    d = Sheet1.Range("A1").CurrentRegion.Value
    nR = UBound(d, 1)
    nC = UBound(d, 2)
    ReDim m(1 To nR, 1 To nR)

    For r = 1 To nR
        For r2 = r To nR
            i = 0
            For c = 1 To nC
                v1 = d(r, c): If IsError(v1) Then v1 = "Error!"
                v2 = d(r2, c): If IsError(v2) Then v2 = "Error!"
                If v1 = v2 Then i = i + 1
            Next c
            m(r2, r) = i
        Next r2
    Next r

    With Sheet2
        .Range("B2").Resize(nR, nR).Value = m
        'assuming your id's are in the first column...
        For r = 1 To nR
            .Cells(1 + r, 1) = d(r, 1)
            .Cells(r, r + 1) = d(r, 1)
        Next r
    End With

End Sub

我在html和发帖上有点乏味...不是程序员,所以请原谅一切...

Sub CalculateDuplicationBetweenRecords()

Dim myCases As Long
Dim myVariables As Long
Dim myCurrentCase As Long
Dim myComparisonCase As Long
Dim myCurrentVariable As Long
Dim myCurrentCell As Long
Dim myComparisonCell As Long
Dim myCounter As Long

'   Would be nice to automate number of cases and variables...
myCases = 88
myVariables = 81
'   Insert case #1 id for results matrix - cosmetic...
Worksheets("Sheet2").Cells(1, 2).Value = Worksheets("Sheet1").Cells(1, 1).Value

For myCurrentCase = 1 To myCases - 1
    For myComparisonCase = myCurrentCase + 1 To myCases
        myCounter = 0
        For myCurrentVariable = 1 To myVariables
            myCurrentCell = Worksheets("Sheet1").Cells(myCurrentCase, myCurrentVariable).Value: If IsError(myCurrentCell) Then myCurrentCell = "Error!"
            myComparisonCell = Worksheets("Sheet1").Cells(myComparisonCase, myCurrentVariable).Value: If IsError(myComparisonCell) Then myComparisonCell = "Error!"
            If myCurrentCell = myComparisonCell Then myCounter = myCounter + 1
        Next myCurrentVariable
        Worksheets("Sheet2").Cells(myCurrentCase + 1, 1).Value = Worksheets("Sheet1").Cells(myCurrentCase, 1).Value
        Worksheets("Sheet2").Cells(1, myComparisonCase + 1).Value = Worksheets("Sheet1").Cells(myComparisonCase, 1).Value
        Worksheets("Sheet2").Cells(myCurrentCase + 1, myComparisonCase + 1).Value = myCounter
    Next myComparisonCase
Next myCurrentCase

End Sub