Excel 数据验证协助
Excel Data Verification Assistance
我面临着一个有趣的数据验证挑战,每个月我的经理都会对多达 2 万行数据(这完全是手动的)执行数据验证 activity,她会在其中查找重复值列 (EventID),如果存在重复值,则比较另一列 (subType) 的值(如果值不相同)。这两行都被复制到不同的 sheet 以进行辅助处理。
我仍在学习 VBA,所以我认为这是一个成长过程 activity,因为这会让我的经理在她的日子里有很多时间。
我有目前正在处理的代码,但我发现我开始走错路了,有人能帮忙吗?
Sub Find_changes ()
Dim eventID As Range
Dim subtype As Range
Dim cell As Range
Dim LastRow As Long
Dim Reader As Worksheet
Dim Writer As Worksheet
Set Reader = ThisWorkbook.Worksheets(2)
Set Writer = ThisWorkbook.Worksheets(3)
Set eventID = Reader.Range("b:b")
Set subtype = Reader.Range("j:j")
Let LastRow = Writer.Cells(Rows.Count, 1).End(xlUp).Row + 1
For Each cell In eventID
If eventID = eventID And subtype <> subtype Then
cell.EntireRow.Copy Destination:=Writer.Range(LastRow)
End If
Next
End Sub
我还在下面包含了一个模拟数据集:
Mock Data Set
理想情况下我想要实现的(以图片为例)是:
- 阅读 B 列(事件 ID)以识别重复项
- 重复 (824466) 比较值列 J(报告子类型)
- 如果值不同(此示例为 SubType 1 和 SubType 2)
- 复制两行以分开 sheet
对于大量单元格,将范围存储为数组并在 VBA 内存中使用它们通常比直接在 Excel 中使用它们更有好处。您会发现在 VBA 内存中处理数据比直接处理 Excel 快得多。
下面是一段代码(非常需要您进行修改才能使用),它将引导您朝着正确的方向前进。我在代码片段下方添加了注释来解释方法,并指出您还需要做些什么才能使这项工作为您所用。
Option Explicit
Sub Find_changes_modified()
' Reference the sheet with the data
Dim Reader As Worksheet
Set Reader = ThisWorkbook.Worksheets(2)
' Store the entire dataset as a range
Dim RangeReader As Range
Set RangeReader = Reader.Range("A1:J6") ' ***
' Sort the range from lowest to highest EventID, and lowest to highest Report Subtype
' Sorting the range allows us to compare EventIDs that are next to one another
With Reader.Sort
.SortFields.Clear
.SortFields.Add2 Key:=Range("B2:B6"), Order:=xlAscending ' ***
.SortFields.Add2 Key:=Range("J2:J6"), Order:=xlAscending ' ***
.SetRange RangeReader
.Header = xlYes
.Apply
End With
' Store the entire range as an array
Dim ArrayReader() As Variant
ArrayReader = RangeReader.Value
' Column numbers of columns in the array
Dim ColumnNumberEventID As Long, ColumnNumberSubtype As Long
ColumnNumberEventID = 2 ' ***
ColumnNumberSubtype = 10 ' ***
' Store all duplicates in another array
' Make ArrayWriter the same size as ArrayReader
Dim ArrayWriter() As Variant
ReDim ArrayWriter(1 To UBound(ArrayReader, 1), 1 To UBound(ArrayReader, 2))
Dim Duplicates As Long
Duplicates = 0
' Iterate through the array and check for duplicates in the EventID column
Dim ii As Long, jj As Long
Dim ThisEventID As String, NextEventID As String, ThisSubType As String, NextSubType As String
For ii = LBound(ArrayReader, 1) To UBound(ArrayReader, 1) - 1
ThisEventID = ArrayReader(ii, ColumnNumberEventID)
NextEventID = ArrayReader(ii + 1, ColumnNumberEventID)
If ThisEventID = NextEventID Then
ThisSubType = ArrayReader(ii, ColumnNumberSubtype)
NextSubType = ArrayReader(ii + 1, ColumnNumberSubtype)
If ThisSubType <> NextSubType Then
Duplicates = Duplicates + 1
' Copy all of the row's information to the ArrayWriter
For jj = LBound(ArrayReader, 2) To UBound(ArrayReader, 2)
ArrayWriter(Duplicates, jj) = ArrayReader(ii, jj)
Next jj
End If
End If
Next ii
' Reference the sheet to write the duplicate data
Dim Writer As Worksheet
Set Writer = ThisWorkbook.Worksheets(3)
' Write the contents of the ArrayWriter to the other sheet
Writer.Range("A1:J1").Value = ArrayWriter
End Sub
大图是根据 2 列(事件 ID 和报告子类型)对数据进行排序,然后将数据 row-by-row 与其相邻数据进行比较。这种排序意味着我们只需要将每一行数据与其相邻数据进行比较,而不是每次都将多行与多行进行比较。
此代码中所有带有注释的地方***
表示实际使用时需要更改的内容。我使用了许多硬编码值来说明它是如何工作的。
我们首先使用数据 Reader
创建对 sheet 的引用,就像您最初所做的那样。接下来,我们将数据存储在一个范围内,RangeReader
。使用此范围,我们根据事件 ID 列对数据进行排序,然后根据报告子类型列对数据进行排序。
范围排序后,我们将其存储为数组 ArrayReader
,因此我们可以在 VBA 内存中使用它。我们还需要一个数组来存储我们遇到的任何重复数据行,ArrayWriter
。然后,遍历 ArrayReader
的内容并将每一行与其相邻行进行比较。如果一行及其邻居满足条件,则将其添加到 ArrayWriter
。完成所有这些后,将 ArrayWriter
信息写入 sheet.
您需要考虑的一些事项:
- 你能找到一种编程方式来确定
RangeReader
的单元格,而不是仅仅将它们手动输入代码吗?
- 你能找到一种编程方法来确定用于排序的每个范围吗
RangeReader
?
ColumnNumberEventID
和 ColumnNumberSubType
的值总是分别为 2 和 10 吗?如果不是,您如何确保您的代码始终引用正确的列?
- 当此代码找到重复项时,它只存储其中一项。您可能还想存储重复的其他项目。
请尝试下一种方法,对于更大的范围应该足够快。它使用脚本 Dictionary
来查找重复项,并根据创建的范围(使用 SpecialCells(xlCellTypeConstants)
)在使用范围之外的列中放置一个标记,该范围将在最后立即复制:
Sub Find_changes()
Dim Reader As Worksheet, LastRow As Long, Writer As Worksheet, rngCopy As Range
Dim arrID, arrType, arrMark, i As Long, maxCol As Long, maxRow As Long, dict As Object
Set Reader = ThisWorkbook.Worksheets(2)
Set Writer = ThisWorkbook.Worksheets(3)
LastRow = Writer.Range("B" & Writer.rows.count).End(xlUp).Row + 1
maxRow = Reader.Range("B" & Reader.rows.count).End(xlUp).Row
maxCol = Reader.UsedRange.Columns.count + 20
arrID = Reader.Range("B1:B" & maxRow).value 'plase the range in an array for faster iteration
arrType = Reader.Range("J1:J" & maxRow).value 'plase the range in an array for faster iteration
ReDim arrMark(1 To maxRow, 1 To 1)
Set dict = CreateObject("Scripting.Dictionary")
'Place the duplicates in a dictionary and mark their rows:
For i = 1 To maxRow
If Not dict.Exists(arrID(i, 1)) Then
dict.Add arrID(i, 1), Array(arrType(i, 1), i)
Else
If dict(arrID(i, 1))(0) <> arrType(i, 1) Then
arrMark(i, 1) = "Copy" 'write "Copy" in arrMark
arrMark(dict(arrID(i, 1))(1), 1) = "Copy" 'write "Copy" in arrMark
End If
End If
Next
'drop the marker array after the last column:
Reader.cells(1, maxCol).Resize(UBound(arrMark), 1).value = arrMark
'Extract the range to be copied (at once)
On Error Resume Next
Set rngCopy = Reader.Range(Reader.cells(1, maxCol), Reader.cells(maxRow, maxCol)).SpecialCells(xlCellTypeConstants)
On Error GoTo 0
If Not rngCopy Is Nothing Then
rngCopy.ClearContents
rngCopy.EntireRow.Copy Writer.Range("A" & LastRow)
End If
MsgBox "Ready..."
End Sub
我试着评论代码行,但如果有什么地方不够清楚,请不要犹豫,要求澄清。
我面临着一个有趣的数据验证挑战,每个月我的经理都会对多达 2 万行数据(这完全是手动的)执行数据验证 activity,她会在其中查找重复值列 (EventID),如果存在重复值,则比较另一列 (subType) 的值(如果值不相同)。这两行都被复制到不同的 sheet 以进行辅助处理。
我仍在学习 VBA,所以我认为这是一个成长过程 activity,因为这会让我的经理在她的日子里有很多时间。
我有目前正在处理的代码,但我发现我开始走错路了,有人能帮忙吗?
Sub Find_changes ()
Dim eventID As Range
Dim subtype As Range
Dim cell As Range
Dim LastRow As Long
Dim Reader As Worksheet
Dim Writer As Worksheet
Set Reader = ThisWorkbook.Worksheets(2)
Set Writer = ThisWorkbook.Worksheets(3)
Set eventID = Reader.Range("b:b")
Set subtype = Reader.Range("j:j")
Let LastRow = Writer.Cells(Rows.Count, 1).End(xlUp).Row + 1
For Each cell In eventID
If eventID = eventID And subtype <> subtype Then
cell.EntireRow.Copy Destination:=Writer.Range(LastRow)
End If
Next
End Sub
我还在下面包含了一个模拟数据集:
Mock Data Set
理想情况下我想要实现的(以图片为例)是:
- 阅读 B 列(事件 ID)以识别重复项
- 重复 (824466) 比较值列 J(报告子类型)
- 如果值不同(此示例为 SubType 1 和 SubType 2)
- 复制两行以分开 sheet
对于大量单元格,将范围存储为数组并在 VBA 内存中使用它们通常比直接在 Excel 中使用它们更有好处。您会发现在 VBA 内存中处理数据比直接处理 Excel 快得多。
下面是一段代码(非常需要您进行修改才能使用),它将引导您朝着正确的方向前进。我在代码片段下方添加了注释来解释方法,并指出您还需要做些什么才能使这项工作为您所用。
Option Explicit
Sub Find_changes_modified()
' Reference the sheet with the data
Dim Reader As Worksheet
Set Reader = ThisWorkbook.Worksheets(2)
' Store the entire dataset as a range
Dim RangeReader As Range
Set RangeReader = Reader.Range("A1:J6") ' ***
' Sort the range from lowest to highest EventID, and lowest to highest Report Subtype
' Sorting the range allows us to compare EventIDs that are next to one another
With Reader.Sort
.SortFields.Clear
.SortFields.Add2 Key:=Range("B2:B6"), Order:=xlAscending ' ***
.SortFields.Add2 Key:=Range("J2:J6"), Order:=xlAscending ' ***
.SetRange RangeReader
.Header = xlYes
.Apply
End With
' Store the entire range as an array
Dim ArrayReader() As Variant
ArrayReader = RangeReader.Value
' Column numbers of columns in the array
Dim ColumnNumberEventID As Long, ColumnNumberSubtype As Long
ColumnNumberEventID = 2 ' ***
ColumnNumberSubtype = 10 ' ***
' Store all duplicates in another array
' Make ArrayWriter the same size as ArrayReader
Dim ArrayWriter() As Variant
ReDim ArrayWriter(1 To UBound(ArrayReader, 1), 1 To UBound(ArrayReader, 2))
Dim Duplicates As Long
Duplicates = 0
' Iterate through the array and check for duplicates in the EventID column
Dim ii As Long, jj As Long
Dim ThisEventID As String, NextEventID As String, ThisSubType As String, NextSubType As String
For ii = LBound(ArrayReader, 1) To UBound(ArrayReader, 1) - 1
ThisEventID = ArrayReader(ii, ColumnNumberEventID)
NextEventID = ArrayReader(ii + 1, ColumnNumberEventID)
If ThisEventID = NextEventID Then
ThisSubType = ArrayReader(ii, ColumnNumberSubtype)
NextSubType = ArrayReader(ii + 1, ColumnNumberSubtype)
If ThisSubType <> NextSubType Then
Duplicates = Duplicates + 1
' Copy all of the row's information to the ArrayWriter
For jj = LBound(ArrayReader, 2) To UBound(ArrayReader, 2)
ArrayWriter(Duplicates, jj) = ArrayReader(ii, jj)
Next jj
End If
End If
Next ii
' Reference the sheet to write the duplicate data
Dim Writer As Worksheet
Set Writer = ThisWorkbook.Worksheets(3)
' Write the contents of the ArrayWriter to the other sheet
Writer.Range("A1:J1").Value = ArrayWriter
End Sub
大图是根据 2 列(事件 ID 和报告子类型)对数据进行排序,然后将数据 row-by-row 与其相邻数据进行比较。这种排序意味着我们只需要将每一行数据与其相邻数据进行比较,而不是每次都将多行与多行进行比较。
此代码中所有带有注释的地方***
表示实际使用时需要更改的内容。我使用了许多硬编码值来说明它是如何工作的。
我们首先使用数据 Reader
创建对 sheet 的引用,就像您最初所做的那样。接下来,我们将数据存储在一个范围内,RangeReader
。使用此范围,我们根据事件 ID 列对数据进行排序,然后根据报告子类型列对数据进行排序。
范围排序后,我们将其存储为数组 ArrayReader
,因此我们可以在 VBA 内存中使用它。我们还需要一个数组来存储我们遇到的任何重复数据行,ArrayWriter
。然后,遍历 ArrayReader
的内容并将每一行与其相邻行进行比较。如果一行及其邻居满足条件,则将其添加到 ArrayWriter
。完成所有这些后,将 ArrayWriter
信息写入 sheet.
您需要考虑的一些事项:
- 你能找到一种编程方式来确定
RangeReader
的单元格,而不是仅仅将它们手动输入代码吗? - 你能找到一种编程方法来确定用于排序的每个范围吗
RangeReader
? ColumnNumberEventID
和ColumnNumberSubType
的值总是分别为 2 和 10 吗?如果不是,您如何确保您的代码始终引用正确的列?- 当此代码找到重复项时,它只存储其中一项。您可能还想存储重复的其他项目。
请尝试下一种方法,对于更大的范围应该足够快。它使用脚本 Dictionary
来查找重复项,并根据创建的范围(使用 SpecialCells(xlCellTypeConstants)
)在使用范围之外的列中放置一个标记,该范围将在最后立即复制:
Sub Find_changes()
Dim Reader As Worksheet, LastRow As Long, Writer As Worksheet, rngCopy As Range
Dim arrID, arrType, arrMark, i As Long, maxCol As Long, maxRow As Long, dict As Object
Set Reader = ThisWorkbook.Worksheets(2)
Set Writer = ThisWorkbook.Worksheets(3)
LastRow = Writer.Range("B" & Writer.rows.count).End(xlUp).Row + 1
maxRow = Reader.Range("B" & Reader.rows.count).End(xlUp).Row
maxCol = Reader.UsedRange.Columns.count + 20
arrID = Reader.Range("B1:B" & maxRow).value 'plase the range in an array for faster iteration
arrType = Reader.Range("J1:J" & maxRow).value 'plase the range in an array for faster iteration
ReDim arrMark(1 To maxRow, 1 To 1)
Set dict = CreateObject("Scripting.Dictionary")
'Place the duplicates in a dictionary and mark their rows:
For i = 1 To maxRow
If Not dict.Exists(arrID(i, 1)) Then
dict.Add arrID(i, 1), Array(arrType(i, 1), i)
Else
If dict(arrID(i, 1))(0) <> arrType(i, 1) Then
arrMark(i, 1) = "Copy" 'write "Copy" in arrMark
arrMark(dict(arrID(i, 1))(1), 1) = "Copy" 'write "Copy" in arrMark
End If
End If
Next
'drop the marker array after the last column:
Reader.cells(1, maxCol).Resize(UBound(arrMark), 1).value = arrMark
'Extract the range to be copied (at once)
On Error Resume Next
Set rngCopy = Reader.Range(Reader.cells(1, maxCol), Reader.cells(maxRow, maxCol)).SpecialCells(xlCellTypeConstants)
On Error GoTo 0
If Not rngCopy Is Nothing Then
rngCopy.ClearContents
rngCopy.EntireRow.Copy Writer.Range("A" & LastRow)
End If
MsgBox "Ready..."
End Sub
我试着评论代码行,但如果有什么地方不够清楚,请不要犹豫,要求澄清。