如何计算 OpenOffice/LibreOffice BASIC 中的重复条目?
How to count duplicate entries in OpenOffice/LibreOffice BASIC?
我在 LibreOffice 的许多工作表中有大量数据——一个 ADDRESS
列和一个 DATA
列——我想计算每个地址出现的次数, 放入 NUM_ADDR
列。例如:
ADDR | DATA | NUM_ADDR
00000000bbfe22d0 | 876d4eb163886d4e | 1
00000000b9dfffd0 | 4661bada6d4661ba | 1
00000000b9dfc3d0 | 5d4b40b4705d4b40 | 1
00000000b9def7d0 | 8f8570a5808f8570 | 1
00000000b9de17d0 | 63876d4eb163886d | 1
00000000b9dddfd0 | 6d4eb163886d4eb1 | 3
00000000b9dddfd0 | 705d4b40b4705d4b |
00000000b9dddfd0 | b4705d4b40b4705d |
00000000b7df83d0 | 40b4705d4b40b470 | 1
00000000b7d607d0 | 705d4b40b4705d4b | 1
...
手动操作时,我在每个地址上都使用了 COUNTIF
函数,但我发现宏会在较长的 运行 中节省时间。鉴于先前的函数已经确定了存储在 RowCounter
:
中的数据的长度(行数),这是我目前所拥有的片段
Dim CountedAddr(RowCounter, RowCounter) as String
Dim CountedAddrPtr as Integer
Dim CurrentCell as Object
Dim i as Integer
CountedAddrPtr = 0
' Populate CountedAddr array
For i = 1 to RowCounter-1
CurrentCell = CurrentSheet.getCellByPosition(0, i)
If Not CurrentCell.String In CountedAddr(?) Then
CurrentSheet.getCellByPosition(2, i).Value = 1 ' for debugging
CountedAddr(CountedAddrPtr, 0) = CurrentCell.String
CountedAddrPtr = CountedAddrPtr + 1
Else
CurrentSheet.getCellByPosition(2, i).Value = 0 ' for debugging
EndIf
Next
' For each unique address, count number of occurances
For i = 0 to UBound(CountedAddr())
For j = 1 to RowCounter-1
If CurrentSheet.getCellByPosition(0, j).String = CountedAddr(i, 0) Then
CountedAddr(i, 1) = CountedAddr(i, 1)+1
EndIf
Next
Next
' Another function to populate NUM_ADDR from CountedAddr array...
所以我的第一个问题是:我们如何确定一个元素(当前单元格中的地址)是否在 CountedAddr
数组中(参见上面的 (?)
)?其次,是否有更有效的方法来实现第二个代码块?不幸的是,排序是不可能的,因为地址和数据的时间顺序形成了某种时间基准。第三,整个 shebang 是解决这个问题的愚蠢方法吗?
非常感谢硬件对软件任务的贡献!
像 VB6 Collection 这样的字典类型的对象对于查找项目是有效的,因为它直接找到键而不是循环遍历一个长数组。我们下面的 countedAddrs
集合将存储每个地址的计数。
Sub CountAddrs
Dim countedAddrs As New Collection
Dim oCurrentSheet As Object
Dim oCurrentCell As Object
Dim currentAddr As String
Dim i As Integer
Dim newCount As Integer
Dim rowCounter As Integer
Const ADDR_COL = 0
Const COUNT_COL = 2
oCurrentSheet = ThisComponent.CurrentController.ActiveSheet
rowCounter = 11
' Populate countedAddrs array.
For i = 1 to rowCounter - 1
oCurrentCell = oCurrentSheet.getCellByPosition(ADDR_COL, i)
currentAddr = oCurrentCell.String
If Contains(countedAddrs, currentAddr) Then
' Increment the count.
newCount = countedAddrs.Item(currentAddr) + 1
countedAddrs.Remove(currentAddr)
countedAddrs.Add(newCount, currentAddr)
oCurrentSheet.getCellByPosition(COUNT_COL, i).Value = newCount ' for debugging
Else
countedAddrs.Add(1, currentAddr)
oCurrentSheet.getCellByPosition(COUNT_COL, i).Value = 1 ' for debugging
EndIf
Next
End Sub
此代码需要以下辅助函数。在大多数语言中,字典对象都内置了这个功能,但 Basic 相当简单。
' Returns True if the collection contains the key, otherwise False.
Function Contains(coll As Collection, key As Variant)
On Error Goto ErrorHandler
coll.Item(key)
Contains = True
Exit Function
ErrorHandler:
Contains = False
End Function
我在 LibreOffice 的许多工作表中有大量数据——一个 ADDRESS
列和一个 DATA
列——我想计算每个地址出现的次数, 放入 NUM_ADDR
列。例如:
ADDR | DATA | NUM_ADDR
00000000bbfe22d0 | 876d4eb163886d4e | 1
00000000b9dfffd0 | 4661bada6d4661ba | 1
00000000b9dfc3d0 | 5d4b40b4705d4b40 | 1
00000000b9def7d0 | 8f8570a5808f8570 | 1
00000000b9de17d0 | 63876d4eb163886d | 1
00000000b9dddfd0 | 6d4eb163886d4eb1 | 3
00000000b9dddfd0 | 705d4b40b4705d4b |
00000000b9dddfd0 | b4705d4b40b4705d |
00000000b7df83d0 | 40b4705d4b40b470 | 1
00000000b7d607d0 | 705d4b40b4705d4b | 1
...
手动操作时,我在每个地址上都使用了 COUNTIF
函数,但我发现宏会在较长的 运行 中节省时间。鉴于先前的函数已经确定了存储在 RowCounter
:
Dim CountedAddr(RowCounter, RowCounter) as String
Dim CountedAddrPtr as Integer
Dim CurrentCell as Object
Dim i as Integer
CountedAddrPtr = 0
' Populate CountedAddr array
For i = 1 to RowCounter-1
CurrentCell = CurrentSheet.getCellByPosition(0, i)
If Not CurrentCell.String In CountedAddr(?) Then
CurrentSheet.getCellByPosition(2, i).Value = 1 ' for debugging
CountedAddr(CountedAddrPtr, 0) = CurrentCell.String
CountedAddrPtr = CountedAddrPtr + 1
Else
CurrentSheet.getCellByPosition(2, i).Value = 0 ' for debugging
EndIf
Next
' For each unique address, count number of occurances
For i = 0 to UBound(CountedAddr())
For j = 1 to RowCounter-1
If CurrentSheet.getCellByPosition(0, j).String = CountedAddr(i, 0) Then
CountedAddr(i, 1) = CountedAddr(i, 1)+1
EndIf
Next
Next
' Another function to populate NUM_ADDR from CountedAddr array...
所以我的第一个问题是:我们如何确定一个元素(当前单元格中的地址)是否在 CountedAddr
数组中(参见上面的 (?)
)?其次,是否有更有效的方法来实现第二个代码块?不幸的是,排序是不可能的,因为地址和数据的时间顺序形成了某种时间基准。第三,整个 shebang 是解决这个问题的愚蠢方法吗?
非常感谢硬件对软件任务的贡献!
像 VB6 Collection 这样的字典类型的对象对于查找项目是有效的,因为它直接找到键而不是循环遍历一个长数组。我们下面的 countedAddrs
集合将存储每个地址的计数。
Sub CountAddrs
Dim countedAddrs As New Collection
Dim oCurrentSheet As Object
Dim oCurrentCell As Object
Dim currentAddr As String
Dim i As Integer
Dim newCount As Integer
Dim rowCounter As Integer
Const ADDR_COL = 0
Const COUNT_COL = 2
oCurrentSheet = ThisComponent.CurrentController.ActiveSheet
rowCounter = 11
' Populate countedAddrs array.
For i = 1 to rowCounter - 1
oCurrentCell = oCurrentSheet.getCellByPosition(ADDR_COL, i)
currentAddr = oCurrentCell.String
If Contains(countedAddrs, currentAddr) Then
' Increment the count.
newCount = countedAddrs.Item(currentAddr) + 1
countedAddrs.Remove(currentAddr)
countedAddrs.Add(newCount, currentAddr)
oCurrentSheet.getCellByPosition(COUNT_COL, i).Value = newCount ' for debugging
Else
countedAddrs.Add(1, currentAddr)
oCurrentSheet.getCellByPosition(COUNT_COL, i).Value = 1 ' for debugging
EndIf
Next
End Sub
此代码需要以下辅助函数。在大多数语言中,字典对象都内置了这个功能,但 Basic 相当简单。
' Returns True if the collection contains the key, otherwise False.
Function Contains(coll As Collection, key As Variant)
On Error Goto ErrorHandler
coll.Item(key)
Contains = True
Exit Function
ErrorHandler:
Contains = False
End Function