如何在每半小时后获得最近的日期
How to get the closest date after every half an hour
我有一个非常大的数据集,在
中看起来像这样
Column A
Date
2016-02-29 15:59:59.674
2016-02-29 15:59:59.695
2016-02-29 15:59:59.716
2016-02-29 15:59:59.752
2016-02-29 15:59:59.804
2016-02-29 15:59:59.869
2016-02-29 15:59:59.888
2016-02-29 15:59:59.941
2016-02-29 16:00:00.081 <-- get closest date since .081 < .941
2016-02-29 16:00:00.168
2016-02-29 16:00:00.189
2016-02-29 16:00:00.198
2016-02-29 16:00:00.247
2016-02-29 16:00:00.311
2016-02-29 16:00:00.345
2016-02-29 16:00:00.357
and for the other half an hour
2016-02-29 16:29:58.628
2016-02-29 16:29:58.639
2016-02-29 16:29:58.689
2016-02-29 16:29:58.706
2016-02-29 16:29:58.761
2016-02-29 16:29:58.865
2016-02-29 16:29:59.142
2016-02-29 16:29:59.542
2016-02-29 16:29:59.578
2016-02-29 16:30:00.171 <-- Get this date since .171 < .578
2016-02-29 16:30:00.209
2016-02-29 16:30:00.217
2016-02-29 16:30:00.245
2016-02-29 16:30:00.254
2016-02-29 16:30:00.347
2016-02-29 16:30:00.422
2016-02-29 16:30:00.457
2016-02-29 16:30:00.491
2016-02-29 16:30:00.555
2016-02-29 16:30:00.557
2016-02-29 16:30:00.645
现在数据集中的总行数约为 5468389,这对于 excel 来说非常大,无法将所有内容导入一列,因此我尝试分段处理数据。
还有其他方法吗?通过它我可以处理所有数据?
我尝试直接读取和写入文本,但每当我尝试将其作为日期读取时,由于格式原因,它会给我一个 Type Mismatch
错误。出于同样的原因,我没有使用 python 来解决这个问题,因为我也不精通 python,所以我想在 Excel VBA 中这样做。
另外我不太确定这个逻辑,所以我需要一些帮助。
Option Explicit
Sub Get_Closest_Dates()
Application.ScreenUpdating = False
Dim WI As Worksheet, WO As Worksheet
Dim i As Long, ct As Long
Dim num1 As Integer, num2 As Integer, num3 As Integer
Dim df1, df2
Set WI = Sheet1 'INPUT SHEET
Set WO = Sheet2 'OUTPUT SHEET
WI.Range("A:A").NumberFormat = "YYYY-MM-DD HH:MM:SS"
WO.Range("A:A").NumberFormat = "YYYY-MM-DD HH:MM:SS"
WI.Range("B1") = "HOUR"
WI.Range("C1") = "MINUTE"
With WI
.Range("B2").Formula = "=HOUR(A2)"
.Range("B2:B" & Rows.Count).FillDown
.Range("C2").Formula = "=MINUTE(A2)"
.Range("C2:C" & Rows.Count).FillDown
ct = WO.Range("A" & Rows.Count).End(xlUp).Row + 1
For i = 2 To 10000
num1 = .Range("C" & i).Value 'get Minutes
num2 = .Range("C" & i + 1).Value
If (num1 = 29 And num2 = 30) Then
df1 = 0.5 - TimeValue(.Range("A" & i))
df2 = TimeValue(.Range("A" & i + 1)) - 0.5
If df1 < df2 Then
WO.Range("A" & ct) = .Range("A" & i)
ct = ct + 1
Else
WO.Range("A" & ct) = .Range("A" & i + 1)
ct = ct + 1
End If
End If
If (num1 = 59 And num2 = 0) Then
df1 = 1 - TimeValue(.Range("A" & i))
df2 = TimeValue(.Range("A" & i + 1)) - 1
If df1 < df2 Then
WO.Range("A" & ct) = .Range("A" & i)
ct = ct + 1
Else
WO.Range("A" & ct) = .Range("A" & i + 1)
ct = ct + 1
End If
End If
Next i
End With
Application.ScreenUpdating = True
MsgBox "Process Completed"
End Sub
我也不确定如何从日期中获取毫秒部分以避免计算两个日期的差异
喜欢15:59:59.674我怎样才能从时间得到674
?
如果您颠倒排序顺序,您可以使用 Match 函数查找列表中刚好大于(刚好在)特定时间的条目的索引。
类似于:
=MATCH(HalfHourValue,RangeContainingTimes,-1)
您必须颠倒订单;它为您提供索引而不是实际值。
要获取您刚刚找到的条目值的毫秒数,类似以下的方法应该有效:
=RIGHT(TEXT(INDEX(RangeContainingTimes,IxFromAbove,1),"HH:MM:ss.000"),3)
您的第一个问题似乎是将数据输入 Excel。了解 Excel 可能不是处理如此大量数据的最佳程序(诸如 Access 之类的数据库程序可能更好),您需要将数据拆分到多个列或工作表中;或获取数据样本。
您选择了抽样,所以我会在您读入数据时进行抽样和测试。
您还必须处理 Excel/VBA 处理包含毫秒的 date/time 时间戳的限制。
但是为了测试数据,不需要关注毫秒。只要您的数据按升序排列,那么具有等于或大于 30 分钟增量的 date/time 标记的第一行将是最早的行。
下面的代码应该只读取满足该条件的大文件的行。请阅读评论以获取更多信息。
台词汇集成一个集合;然后声明、填充结果数组,并将结果写入工作表。
如果每一行由多个字段组成,而不仅仅是你显示的那一行,那么,在写结果的时候,你会声明结果数组来保存所有的列,然后在那个时候填充它。
使用 Collection / Array / write to the worksheet 序列比在处理时将每一行一次写入工作表要快得多。
有一些方法可以加快代码速度,也有一些方法可以处理可能的 "out of memory" 错误,但这取决于您的真实数据以及这个简单代码的运行情况。
至于将我们现在需要的 date/time 标记转换为 Excel 解释为字符串,到 "real" date/times,这取决于关于您想如何处理后续数据。
==========================================
Option Explicit
'Set Reference to Microsoft Scripting Runtime
Sub GetBigData()
Dim FSO As FileSystemObject
Dim TS As TextStream
Dim vFileName As Variant
Dim sLine As String
Dim dtLineTime As Date
Dim dtNextTime As Date
Dim colLines As Collection
vFileName = Application.GetOpenFilename("Text Files(*.txt), *.txt")
If vFileName = False Then Exit Sub
Set FSO = New FileSystemObject
Set TS = FSO.OpenTextFile(vFileName, ForReading, False, TristateFalse)
Set colLines = New Collection
With TS
'Assumes date/time stamps are contiguous
'skip any header lines
Do
sLine = .ReadLine
Loop Until InStr(sLine, ".") > 0
'Compute first "NextTime"
' note that it might be the first entry
' comment line 3 below if want first entry
' but would need to add logic if using other time increments
dtLineTime = CDate(Left(sLine, InStr(sLine, ".") - 1))
dtNextTime = Int(dtLineTime) + TimeSerial(Hour(dtLineTime), Int(Minute(dtLineTime) / 30) * 30, 0)
If Not (Minute(dtLineTime) = 30 Or Minute(dtLineTime) = 60) Then dtNextTime = dtNextTime + TimeSerial(0, 30, 0)
Do
'Due to IEEE rounding problems, need to test equality as a very small value
'Could use a value less than 1 second = 1/86400 or smaller
If Abs(dtLineTime - dtNextTime) < 0.00000001 Or _
dtLineTime > dtNextTime Then
colLines.Add sLine
dtNextTime = dtNextTime + TimeSerial(0, 30, 0)
End If
If Not .AtEndOfStream Then
sLine = .ReadLine
dtLineTime = CDate(Left(sLine, InStr(sLine, ".") - 1))
End If
Loop Until .AtEndOfStream
.Close
End With
'Write the collection to the worksheet
Dim V As Variant
Dim wsResults As Worksheet, rResults As Range
Dim I As Long
Set wsResults = Worksheets("sheet1")
Set rResults = wsResults.Cells(1, 1)
ReDim V(1 To colLines.Count, 1 To 1)
Set rResults = rResults.Resize(UBound(V, 1), UBound(V, 2))
For I = 1 To UBound(V, 1)
V(I, 1) = CStr(colLines(I))
Next I
With rResults
.EntireColumn.Clear
.NumberFormat = "@"
.Value = V
.EntireColumn.AutoFit
End With
End Sub
==========================================
EDIT 添加了时间戳转换功能。
这可以在数据从集合对象复制到变量数组时实现。例如:
V(I, 1) = ConvertTimeStamp(colLines(I))
由于收到的值是 Double 数据类型,您还需要在工作表上适当地设置该列的格式,而不是将其设置为文本:
.NumberFormat = "yyyy-mm-dd hh:mm:ss.000"
我们必须 return 将值作为 Double,因为 VBA 日期类型数据不支持毫秒。
================================
Private Function ConvertTimeStamp(sTmStmp As String) As Double
Dim dtPart As Date
Dim dMS As Double 'milliseconds
Dim V As Variant
'Convert the date and time
V = Split(sTmStmp, ".")
dtPart = CDate(V(0))
dMS = V(1)
ConvertTimeStamp = dtPart + dMS / 86400 / 1000
End Function
================================
我有一个非常大的数据集,在
中看起来像这样 Column A
Date
2016-02-29 15:59:59.674
2016-02-29 15:59:59.695
2016-02-29 15:59:59.716
2016-02-29 15:59:59.752
2016-02-29 15:59:59.804
2016-02-29 15:59:59.869
2016-02-29 15:59:59.888
2016-02-29 15:59:59.941
2016-02-29 16:00:00.081 <-- get closest date since .081 < .941
2016-02-29 16:00:00.168
2016-02-29 16:00:00.189
2016-02-29 16:00:00.198
2016-02-29 16:00:00.247
2016-02-29 16:00:00.311
2016-02-29 16:00:00.345
2016-02-29 16:00:00.357
and for the other half an hour
2016-02-29 16:29:58.628
2016-02-29 16:29:58.639
2016-02-29 16:29:58.689
2016-02-29 16:29:58.706
2016-02-29 16:29:58.761
2016-02-29 16:29:58.865
2016-02-29 16:29:59.142
2016-02-29 16:29:59.542
2016-02-29 16:29:59.578
2016-02-29 16:30:00.171 <-- Get this date since .171 < .578
2016-02-29 16:30:00.209
2016-02-29 16:30:00.217
2016-02-29 16:30:00.245
2016-02-29 16:30:00.254
2016-02-29 16:30:00.347
2016-02-29 16:30:00.422
2016-02-29 16:30:00.457
2016-02-29 16:30:00.491
2016-02-29 16:30:00.555
2016-02-29 16:30:00.557
2016-02-29 16:30:00.645
现在数据集中的总行数约为 5468389,这对于 excel 来说非常大,无法将所有内容导入一列,因此我尝试分段处理数据。
还有其他方法吗?通过它我可以处理所有数据?
我尝试直接读取和写入文本,但每当我尝试将其作为日期读取时,由于格式原因,它会给我一个 Type Mismatch
错误。出于同样的原因,我没有使用 python 来解决这个问题,因为我也不精通 python,所以我想在 Excel VBA 中这样做。
另外我不太确定这个逻辑,所以我需要一些帮助。
Option Explicit
Sub Get_Closest_Dates()
Application.ScreenUpdating = False
Dim WI As Worksheet, WO As Worksheet
Dim i As Long, ct As Long
Dim num1 As Integer, num2 As Integer, num3 As Integer
Dim df1, df2
Set WI = Sheet1 'INPUT SHEET
Set WO = Sheet2 'OUTPUT SHEET
WI.Range("A:A").NumberFormat = "YYYY-MM-DD HH:MM:SS"
WO.Range("A:A").NumberFormat = "YYYY-MM-DD HH:MM:SS"
WI.Range("B1") = "HOUR"
WI.Range("C1") = "MINUTE"
With WI
.Range("B2").Formula = "=HOUR(A2)"
.Range("B2:B" & Rows.Count).FillDown
.Range("C2").Formula = "=MINUTE(A2)"
.Range("C2:C" & Rows.Count).FillDown
ct = WO.Range("A" & Rows.Count).End(xlUp).Row + 1
For i = 2 To 10000
num1 = .Range("C" & i).Value 'get Minutes
num2 = .Range("C" & i + 1).Value
If (num1 = 29 And num2 = 30) Then
df1 = 0.5 - TimeValue(.Range("A" & i))
df2 = TimeValue(.Range("A" & i + 1)) - 0.5
If df1 < df2 Then
WO.Range("A" & ct) = .Range("A" & i)
ct = ct + 1
Else
WO.Range("A" & ct) = .Range("A" & i + 1)
ct = ct + 1
End If
End If
If (num1 = 59 And num2 = 0) Then
df1 = 1 - TimeValue(.Range("A" & i))
df2 = TimeValue(.Range("A" & i + 1)) - 1
If df1 < df2 Then
WO.Range("A" & ct) = .Range("A" & i)
ct = ct + 1
Else
WO.Range("A" & ct) = .Range("A" & i + 1)
ct = ct + 1
End If
End If
Next i
End With
Application.ScreenUpdating = True
MsgBox "Process Completed"
End Sub
我也不确定如何从日期中获取毫秒部分以避免计算两个日期的差异
喜欢15:59:59.674我怎样才能从时间得到674
?
如果您颠倒排序顺序,您可以使用 Match 函数查找列表中刚好大于(刚好在)特定时间的条目的索引。 类似于:
=MATCH(HalfHourValue,RangeContainingTimes,-1)
您必须颠倒订单;它为您提供索引而不是实际值。
要获取您刚刚找到的条目值的毫秒数,类似以下的方法应该有效:
=RIGHT(TEXT(INDEX(RangeContainingTimes,IxFromAbove,1),"HH:MM:ss.000"),3)
您的第一个问题似乎是将数据输入 Excel。了解 Excel 可能不是处理如此大量数据的最佳程序(诸如 Access 之类的数据库程序可能更好),您需要将数据拆分到多个列或工作表中;或获取数据样本。
您选择了抽样,所以我会在您读入数据时进行抽样和测试。
您还必须处理 Excel/VBA 处理包含毫秒的 date/time 时间戳的限制。
但是为了测试数据,不需要关注毫秒。只要您的数据按升序排列,那么具有等于或大于 30 分钟增量的 date/time 标记的第一行将是最早的行。
下面的代码应该只读取满足该条件的大文件的行。请阅读评论以获取更多信息。
台词汇集成一个集合;然后声明、填充结果数组,并将结果写入工作表。
如果每一行由多个字段组成,而不仅仅是你显示的那一行,那么,在写结果的时候,你会声明结果数组来保存所有的列,然后在那个时候填充它。
使用 Collection / Array / write to the worksheet 序列比在处理时将每一行一次写入工作表要快得多。
有一些方法可以加快代码速度,也有一些方法可以处理可能的 "out of memory" 错误,但这取决于您的真实数据以及这个简单代码的运行情况。
至于将我们现在需要的 date/time 标记转换为 Excel 解释为字符串,到 "real" date/times,这取决于关于您想如何处理后续数据。
==========================================
Option Explicit
'Set Reference to Microsoft Scripting Runtime
Sub GetBigData()
Dim FSO As FileSystemObject
Dim TS As TextStream
Dim vFileName As Variant
Dim sLine As String
Dim dtLineTime As Date
Dim dtNextTime As Date
Dim colLines As Collection
vFileName = Application.GetOpenFilename("Text Files(*.txt), *.txt")
If vFileName = False Then Exit Sub
Set FSO = New FileSystemObject
Set TS = FSO.OpenTextFile(vFileName, ForReading, False, TristateFalse)
Set colLines = New Collection
With TS
'Assumes date/time stamps are contiguous
'skip any header lines
Do
sLine = .ReadLine
Loop Until InStr(sLine, ".") > 0
'Compute first "NextTime"
' note that it might be the first entry
' comment line 3 below if want first entry
' but would need to add logic if using other time increments
dtLineTime = CDate(Left(sLine, InStr(sLine, ".") - 1))
dtNextTime = Int(dtLineTime) + TimeSerial(Hour(dtLineTime), Int(Minute(dtLineTime) / 30) * 30, 0)
If Not (Minute(dtLineTime) = 30 Or Minute(dtLineTime) = 60) Then dtNextTime = dtNextTime + TimeSerial(0, 30, 0)
Do
'Due to IEEE rounding problems, need to test equality as a very small value
'Could use a value less than 1 second = 1/86400 or smaller
If Abs(dtLineTime - dtNextTime) < 0.00000001 Or _
dtLineTime > dtNextTime Then
colLines.Add sLine
dtNextTime = dtNextTime + TimeSerial(0, 30, 0)
End If
If Not .AtEndOfStream Then
sLine = .ReadLine
dtLineTime = CDate(Left(sLine, InStr(sLine, ".") - 1))
End If
Loop Until .AtEndOfStream
.Close
End With
'Write the collection to the worksheet
Dim V As Variant
Dim wsResults As Worksheet, rResults As Range
Dim I As Long
Set wsResults = Worksheets("sheet1")
Set rResults = wsResults.Cells(1, 1)
ReDim V(1 To colLines.Count, 1 To 1)
Set rResults = rResults.Resize(UBound(V, 1), UBound(V, 2))
For I = 1 To UBound(V, 1)
V(I, 1) = CStr(colLines(I))
Next I
With rResults
.EntireColumn.Clear
.NumberFormat = "@"
.Value = V
.EntireColumn.AutoFit
End With
End Sub
==========================================
EDIT 添加了时间戳转换功能。 这可以在数据从集合对象复制到变量数组时实现。例如:
V(I, 1) = ConvertTimeStamp(colLines(I))
由于收到的值是 Double 数据类型,您还需要在工作表上适当地设置该列的格式,而不是将其设置为文本:
.NumberFormat = "yyyy-mm-dd hh:mm:ss.000"
我们必须 return 将值作为 Double,因为 VBA 日期类型数据不支持毫秒。
================================
Private Function ConvertTimeStamp(sTmStmp As String) As Double
Dim dtPart As Date
Dim dMS As Double 'milliseconds
Dim V As Variant
'Convert the date and time
V = Split(sTmStmp, ".")
dtPart = CDate(V(0))
dMS = V(1)
ConvertTimeStamp = dtPart + dMS / 86400 / 1000
End Function
================================