日期时间 python 中的最后一次观察结转
Last Observation Carried Forward in python with datetime
我有这个事件数据集,在检索它时只记录了变化,我希望将这些变化转换为统一的时间序列。数据以 12 小时的时间间隔记录。 retrieval_time 是一个对象,start_time 是 datetime64。
ID Count retrieval_time start_time
100231380 70 2017-10-11T23:30:00.000+10:30 21/10/17 23:30
100231380 70 2017-10-12T11:30:00.000+10:30 21/10/17 23:30
100231380 72 2017-10-12T23:30:00.000+10:30 21/10/17 23:30
100231380 72 2017-10-13T11:30:00.000+10:30 21/10/17 23:30
100231380 73 2017-10-13T23:30:00.000+10:30 21/10/17 23:30
100231380 74 2017-10-14T11:30:00.000+10:30 21/10/17 23:30
100231380 74 2017-10-14T23:30:00.000+10:30 21/10/17 23:30
100231380 74 2017-10-15T11:30:00.000+10:30 21/10/17 23:30
100231380 77 2017-10-15T23:30:00.000+10:30 21/10/17 23:30
100231380 83 2017-10-16T11:30:00.000+10:30 21/10/17 23:30
100231380 85 2017-10-16T23:30:00.000+10:30 21/10/17 23:30
100231380 85 2017-10-17T11:30:00.000+10:30 21/10/17 23:30
100231380 90 2017-10-17T23:30:00.000+10:30 21/10/17 23:30
100231380 90 2017-10-18T11:30:00.000+10:30 21/10/17 23:30
100231380 93 2017-10-18T23:30:00.000+10:30 21/10/17 23:30
100231380 99 2017-10-19T23:30:00.000+10:30 21/10/17 23:30
100231380 104 2017-10-20T23:30:00.000+10:30 21/10/17 23:30
100231380 117 2017-10-21T23:30:00.000+10:30 21/10/17 23:30
我希望能够使其保持一致,例如在最后 3 行中,从检索时间 19/10/2017 开始,11:30am 没有记录数据。我希望能够添加一行并将其替换为整行的最后一次观察。
我想输出成这样..
ID Count retrieval_time start_time
100231380 70 2017-10-11T23:30:00.000+10:30 21/10/17 23:30
100231380 70 2017-10-12T11:30:00.000+10:30 21/10/17 23:30
100231380 72 2017-10-12T23:30:00.000+10:30 21/10/17 23:30
100231380 72 2017-10-13T11:30:00.000+10:30 21/10/17 23:30
100231380 73 2017-10-13T23:30:00.000+10:30 21/10/17 23:30
100231380 74 2017-10-14T11:30:00.000+10:30 21/10/17 23:30
100231380 74 2017-10-14T23:30:00.000+10:30 21/10/17 23:30
100231380 74 2017-10-15T11:30:00.000+10:30 21/10/17 23:30
100231380 77 2017-10-15T23:30:00.000+10:30 21/10/17 23:30
100231380 83 2017-10-16T11:30:00.000+10:30 21/10/17 23:30
100231380 85 2017-10-16T23:30:00.000+10:30 21/10/17 23:30
100231380 85 2017-10-17T11:30:00.000+10:30 21/10/17 23:30
100231380 90 2017-10-17T23:30:00.000+10:30 21/10/17 23:30
100231380 90 2017-10-18T11:30:00.000+10:30 21/10/17 23:30
100231380 93 2017-10-18T23:30:00.000+10:30 21/10/17 23:30
100231380 93 2017-10-19T11:30:00.000+10:30 21/10/17 23:30
100231380 99 2017-10-19T23:30:00.000+10:30 21/10/17 23:30
100231380 99 2017-10-20T11:30:00.000+10:30 21/10/17 23:30
100231380 104 2017-10-20T23:30:00.000+10:30 21/10/17 23:30
100231380 104 2017-10-21T11:30:00.000+10:30 21/10/17 23:30
100231380 117 2017-10-21T23:30:00.000+10:30 21/10/17 23:30
我也想知道如何格式化 retrieval_time 和 start_time 使其相似以便于比较。
而且,我想要一些通用的解决方案,因为我已经聚合了多个事件的分组数据并且时间间隔是相同的 12 小时,但是,retrieval_time 和 start_time 对于所有事件都是不同的.
谢谢。
根据我的理解,这就是我实现上述内容的方式。
我的 csv 数据是:
id,count,ret_time,start_time
10022,60,2017-10-11T11:30:00.000+10:30,21/10/2017 23:30
10023,70,2017-10-11T23:30:00.000+10:30,21/10/2017 23:30
10024,70,2017-10-12T11:30:00.000+10:30,21/10/2017 23:30
10025,80,2017-10-12T23:30:00.000+10:30,21/10/2017 23:30
10026,90,2017-10-13T11:30:00.000+10:30,21/10/2017 23:30
10027,95,2017-10-14T11:30:00.000+10:30,21/10/2017 23:30
脚本如下:
import csv
import time
import datetime
import os
from pathlib import Path
#Read csv data (my file is in a folder '/data')
data_folder = Path(os.getcwd())
file_path = data_folder / 'data/stack_overflow.csv'
#Create list to store csv data
csv_data = []
#Read csv file
with open(file_path) as csvFile:
readCsv = csv.reader(csvFile, delimiter=',')
#Skip header
next(readCsv)
for row in readCsv:
#Add rows in the end of the list
csv_data.append(row)
#Transform time in string to datetime object in dict
for row in range(len(csv_data)):
#Convert the time to floating point milliseconds
csv_data[row][2] = time.mktime(time.strptime(csv_data[row][2], '%Y-%m-%dT%H:%M:%S.%f%z'))
#Parse the dictionary and compare difference between ret_times
prev_time = csv_data[0][2]
print(type(csv_data[row][2]))
for row in range(len(csv_data)):
#Find delta in hours (divide by seconds/hr)
delta = (csv_data[row][2] - prev_time) / 3600
prev_time = csv_data[row][2]
#If the delta is greater than 24 hours, i.e
#there is no value for the 12 hour difference
#then copy the (current row - 1) and assign to a new temp list,
#update the time to 12 hours ahead in the new list,
#add the list item before the current row in dict
if delta > 12.0:
#index of item that is to be copied (current row - 1)
idx = row - 1
#Store the value to be copied in a temp list
temp_list = []
temp_list = csv_data[idx].copy()
#Add 12 hours to the time (add seconds)
temp_list[2] = temp_list[2] + 43200
#Add temp_list element before current row
csv_data.insert(row, temp_list)
#Shows that id: 1026 is added before 1027 as 1026 is missing the value for 11:30PM
print(csv_data)
您可以按照与以下相同的逻辑转换 start_time:
csv_data[row][2] = time.mktime(time.strptime(csv_data[row][2], '%Y-%m-%dT%H:%M:%S.%f%z'))
然后比较ret_time和start_time。
希望这对您有所帮助。
我有这个事件数据集,在检索它时只记录了变化,我希望将这些变化转换为统一的时间序列。数据以 12 小时的时间间隔记录。 retrieval_time 是一个对象,start_time 是 datetime64。
ID Count retrieval_time start_time
100231380 70 2017-10-11T23:30:00.000+10:30 21/10/17 23:30
100231380 70 2017-10-12T11:30:00.000+10:30 21/10/17 23:30
100231380 72 2017-10-12T23:30:00.000+10:30 21/10/17 23:30
100231380 72 2017-10-13T11:30:00.000+10:30 21/10/17 23:30
100231380 73 2017-10-13T23:30:00.000+10:30 21/10/17 23:30
100231380 74 2017-10-14T11:30:00.000+10:30 21/10/17 23:30
100231380 74 2017-10-14T23:30:00.000+10:30 21/10/17 23:30
100231380 74 2017-10-15T11:30:00.000+10:30 21/10/17 23:30
100231380 77 2017-10-15T23:30:00.000+10:30 21/10/17 23:30
100231380 83 2017-10-16T11:30:00.000+10:30 21/10/17 23:30
100231380 85 2017-10-16T23:30:00.000+10:30 21/10/17 23:30
100231380 85 2017-10-17T11:30:00.000+10:30 21/10/17 23:30
100231380 90 2017-10-17T23:30:00.000+10:30 21/10/17 23:30
100231380 90 2017-10-18T11:30:00.000+10:30 21/10/17 23:30
100231380 93 2017-10-18T23:30:00.000+10:30 21/10/17 23:30
100231380 99 2017-10-19T23:30:00.000+10:30 21/10/17 23:30
100231380 104 2017-10-20T23:30:00.000+10:30 21/10/17 23:30
100231380 117 2017-10-21T23:30:00.000+10:30 21/10/17 23:30
我希望能够使其保持一致,例如在最后 3 行中,从检索时间 19/10/2017 开始,11:30am 没有记录数据。我希望能够添加一行并将其替换为整行的最后一次观察。
我想输出成这样..
ID Count retrieval_time start_time
100231380 70 2017-10-11T23:30:00.000+10:30 21/10/17 23:30
100231380 70 2017-10-12T11:30:00.000+10:30 21/10/17 23:30
100231380 72 2017-10-12T23:30:00.000+10:30 21/10/17 23:30
100231380 72 2017-10-13T11:30:00.000+10:30 21/10/17 23:30
100231380 73 2017-10-13T23:30:00.000+10:30 21/10/17 23:30
100231380 74 2017-10-14T11:30:00.000+10:30 21/10/17 23:30
100231380 74 2017-10-14T23:30:00.000+10:30 21/10/17 23:30
100231380 74 2017-10-15T11:30:00.000+10:30 21/10/17 23:30
100231380 77 2017-10-15T23:30:00.000+10:30 21/10/17 23:30
100231380 83 2017-10-16T11:30:00.000+10:30 21/10/17 23:30
100231380 85 2017-10-16T23:30:00.000+10:30 21/10/17 23:30
100231380 85 2017-10-17T11:30:00.000+10:30 21/10/17 23:30
100231380 90 2017-10-17T23:30:00.000+10:30 21/10/17 23:30
100231380 90 2017-10-18T11:30:00.000+10:30 21/10/17 23:30
100231380 93 2017-10-18T23:30:00.000+10:30 21/10/17 23:30
100231380 93 2017-10-19T11:30:00.000+10:30 21/10/17 23:30
100231380 99 2017-10-19T23:30:00.000+10:30 21/10/17 23:30
100231380 99 2017-10-20T11:30:00.000+10:30 21/10/17 23:30
100231380 104 2017-10-20T23:30:00.000+10:30 21/10/17 23:30
100231380 104 2017-10-21T11:30:00.000+10:30 21/10/17 23:30
100231380 117 2017-10-21T23:30:00.000+10:30 21/10/17 23:30
我也想知道如何格式化 retrieval_time 和 start_time 使其相似以便于比较。
而且,我想要一些通用的解决方案,因为我已经聚合了多个事件的分组数据并且时间间隔是相同的 12 小时,但是,retrieval_time 和 start_time 对于所有事件都是不同的.
谢谢。
根据我的理解,这就是我实现上述内容的方式。 我的 csv 数据是:
id,count,ret_time,start_time
10022,60,2017-10-11T11:30:00.000+10:30,21/10/2017 23:30
10023,70,2017-10-11T23:30:00.000+10:30,21/10/2017 23:30
10024,70,2017-10-12T11:30:00.000+10:30,21/10/2017 23:30
10025,80,2017-10-12T23:30:00.000+10:30,21/10/2017 23:30
10026,90,2017-10-13T11:30:00.000+10:30,21/10/2017 23:30
10027,95,2017-10-14T11:30:00.000+10:30,21/10/2017 23:30
脚本如下:
import csv
import time
import datetime
import os
from pathlib import Path
#Read csv data (my file is in a folder '/data')
data_folder = Path(os.getcwd())
file_path = data_folder / 'data/stack_overflow.csv'
#Create list to store csv data
csv_data = []
#Read csv file
with open(file_path) as csvFile:
readCsv = csv.reader(csvFile, delimiter=',')
#Skip header
next(readCsv)
for row in readCsv:
#Add rows in the end of the list
csv_data.append(row)
#Transform time in string to datetime object in dict
for row in range(len(csv_data)):
#Convert the time to floating point milliseconds
csv_data[row][2] = time.mktime(time.strptime(csv_data[row][2], '%Y-%m-%dT%H:%M:%S.%f%z'))
#Parse the dictionary and compare difference between ret_times
prev_time = csv_data[0][2]
print(type(csv_data[row][2]))
for row in range(len(csv_data)):
#Find delta in hours (divide by seconds/hr)
delta = (csv_data[row][2] - prev_time) / 3600
prev_time = csv_data[row][2]
#If the delta is greater than 24 hours, i.e
#there is no value for the 12 hour difference
#then copy the (current row - 1) and assign to a new temp list,
#update the time to 12 hours ahead in the new list,
#add the list item before the current row in dict
if delta > 12.0:
#index of item that is to be copied (current row - 1)
idx = row - 1
#Store the value to be copied in a temp list
temp_list = []
temp_list = csv_data[idx].copy()
#Add 12 hours to the time (add seconds)
temp_list[2] = temp_list[2] + 43200
#Add temp_list element before current row
csv_data.insert(row, temp_list)
#Shows that id: 1026 is added before 1027 as 1026 is missing the value for 11:30PM
print(csv_data)
您可以按照与以下相同的逻辑转换 start_time:
csv_data[row][2] = time.mktime(time.strptime(csv_data[row][2], '%Y-%m-%dT%H:%M:%S.%f%z'))
然后比较ret_time和start_time。
希望这对您有所帮助。