从 Python 中的时间戳列表中查找每天的时间间隔
Finding time intervals per day from a list of timestamps in Python
我正在尝试从 Python 中的 unix 时间戳列表计算每天的时间间隔。我搜索了关于堆栈溢出的类似问题,但大多是找到计算增量或 SQL 解决方案的示例。
我有一个列表:
timestamps = [1176239419.0, 1176334733.0, 1176445137.0, 1177619954.0, 1177620812.0, 1177621082.0, 1177838576.0, 1178349385.0, 1178401697.0, 1178437886.0, 1178926650.0, 1178982127.0, 1179130340.0, 1179263733.0, 1179264930.0, 1179574273.0, 1179671730.0, 1180549056.0, 1180763342.0, 1181386289.0, 1181990860.0, 1182979573.0, 1183326862.0]
我可以使用以下方法轻松地将此时间戳列表转换为日期时间对象:
[dt.datetime.fromtimestamp(int(i)) for i in timestamps]
从那里我可能会写出相当长的代码,其中保留第一个 day/month 并检查列表中的下一个项目是否与 day/month 相同。如果是,我会看时间,从当天获取第一个和最后一个,并将间隔 + day/month 存储在字典中。
因为我是 Python 的新手,所以我想知道在不滥用 if/else 语句的情况下使用这种编程语言执行此操作的最佳方法是什么。
提前致谢
您可以使用 collections.defaultdict
。当您在没有对大小和成员进行初始估计的情况下尝试构建集合时,它非常方便。
from collections import defaultdict
# Initialize default dict by the type list
# Accessing a member that doesn't exist introduces that entry with the deafult value for that type
# Here, when accessing a non-existant member adds an empty list to the collection
intervalsByDate = defaultdict(list)
for t in timestamps:
dt = dt.datetime.fromtimestamp(t)
myDateKey = (dt.day, dt.month, dt.year)
# If the key doesn't exist, a new empty list is added
intervalsByDate[myDateKey].append(t)
由此,intervalsByDate
现在是一个 dict
,其值是根据日历日期排序的时间戳列表。对于每个日期,您可以对时间戳进行排序并获取总间隔。迭代 defaultdict
与 dict
相同(是 dict
的子 class)。
output = {}
for date, timestamps in intervalsByDate.iteritems():
sortedIntervals = sorted(timestamps)
output[date] = sortedIntervals[-1] - sortedIntervals[0]
现在 output
是一个以毫秒为单位的时间间隔作为值的日期映射。随心所欲!
编辑
Is it normal that the keys are not ordered? I have keys from different months mixed togheter.
是的,因为 (hash)maps & dicts
are essentially unordered
How would I be able to, for example, select the first 24 days from a month and then the last
如果我的回答非常坚定,我可能会看看 this, which is an Ordered default dict.。但是,您可以将 output
的数据类型修改为不是 dict
的数据类型以满足您的需要。例如 list
并根据日期排序。
只需将 2 个日期相减即可。这将产生一个 timedelta 实例。
参见 datetime.timedelta:
https://docs.python.org/2/library/datetime.html#timedelta-objects
from datetime import datetime
delta = datetime.today() - datetime(year=2015, month=01, day=01)
#Actual printed out values may change depending o when you execute this :-)
print delta.days, delta.seconds, delta.microseconds #prints 49 50817 381000
print delta.total_seconds() #prints 4284417.381 which is 49*24*3600 + 50817 + 381000/1000000
将此与行切片和压缩相结合以获得您的解决方案。一个示例解决方案是:
timestamps = [1176239419.0, 1176334733.0, 1176445137.0, 1177619954.0, 1177620812.0, 1177621082.0, 1177838576.0, 1178349385.0, 1178401697.0, 1178437886.0, 1178926650.0, 1178982127.0, 1179130340.0, 1179263733.0, 1179264930.0, 1179574273.0, 1179671730.0, 1180549056.0, 1180763342.0, 1181386289.0, 1181990860.0, 1182979573.0, 1183326862.0]
timestamps_as_dates = [datetime.fromtimestamp(int(i)) for i in timestamps]
# Make couples of each timestamp with the next one
# timestamps_as_dates[:-1] -> all your timestamps but the last one
# timestamps_as_dates[1:] -> all your timestamps but the first one
# zip them together so that first and second are one couple, then second and third, ...
intervals = zip(timestamps_as_dates[:-1],timestamps_as_dates[1:])
interval_timedeltas = [(interval[1]-interval[0]).total_seconds() for interval in intervals]
# result = [95314.0, 110404.0, 1174817.0, 858.0, 270.0, 217494.0, 510809.0, 52312.0, 36189.0, 488764.0, 55477.0, 148213.0, 133393.0, 1197.0, 309343.0, 97457.0, 877326.0, 214286.0, 622947.0, 604571.0, 988713.0, 347289.0]
这也适用于从日期中添加或减去特定时间段:
from datetime import datetime, timedelta
tomorrow = datetime.today() + timedelta(days=1)
我没有加减月或年的简单解决方案。
如果列表按照您的情况排序,那么您可以使用 itertools.groupby()
将时间戳分组为天数:
#!/usr/bin/env python
from datetime import date, timedelta
from itertools import groupby
epoch = date(1970, 1, 1)
result = {}
assert timestamps == sorted(timestamps)
for day, group in groupby(timestamps, key=lambda ts: ts // 86400):
# store the interval + day/month in a dictionary.
same_day = list(group)
assert max(same_day) == same_day[-1] and min(same_day) == same_day[0]
result[epoch + timedelta(day)] = same_day[0], same_day[-1]
print(result)
输出
{datetime.date(2007, 4, 10): (1176239419.0, 1176239419.0),
datetime.date(2007, 4, 11): (1176334733.0, 1176334733.0),
datetime.date(2007, 4, 13): (1176445137.0, 1176445137.0),
datetime.date(2007, 4, 26): (1177619954.0, 1177621082.0),
datetime.date(2007, 4, 29): (1177838576.0, 1177838576.0),
datetime.date(2007, 5, 5): (1178349385.0, 1178401697.0),
datetime.date(2007, 5, 6): (1178437886.0, 1178437886.0),
datetime.date(2007, 5, 11): (1178926650.0, 1178926650.0),
datetime.date(2007, 5, 12): (1178982127.0, 1178982127.0),
datetime.date(2007, 5, 14): (1179130340.0, 1179130340.0),
datetime.date(2007, 5, 15): (1179263733.0, 1179264930.0),
datetime.date(2007, 5, 19): (1179574273.0, 1179574273.0),
datetime.date(2007, 5, 20): (1179671730.0, 1179671730.0),
datetime.date(2007, 5, 30): (1180549056.0, 1180549056.0),
datetime.date(2007, 6, 2): (1180763342.0, 1180763342.0),
datetime.date(2007, 6, 9): (1181386289.0, 1181386289.0),
datetime.date(2007, 6, 16): (1181990860.0, 1181990860.0),
datetime.date(2007, 6, 27): (1182979573.0, 1182979573.0),
datetime.date(2007, 7, 1): (1183326862.0, 1183326862.0)}
如果当天只有一个时间戳,则重复两次。
how would you afterwards do to test if the last (for example) 5 entries in the result have a larger interval than the previous 14?
entries = sorted(result.items())
intervals = [(end - start) for _, (start, end) in entries]
print(max(intervals[-5:]) > max(intervals[-5-14:-5]))
# -> False
我正在尝试从 Python 中的 unix 时间戳列表计算每天的时间间隔。我搜索了关于堆栈溢出的类似问题,但大多是找到计算增量或 SQL 解决方案的示例。
我有一个列表:
timestamps = [1176239419.0, 1176334733.0, 1176445137.0, 1177619954.0, 1177620812.0, 1177621082.0, 1177838576.0, 1178349385.0, 1178401697.0, 1178437886.0, 1178926650.0, 1178982127.0, 1179130340.0, 1179263733.0, 1179264930.0, 1179574273.0, 1179671730.0, 1180549056.0, 1180763342.0, 1181386289.0, 1181990860.0, 1182979573.0, 1183326862.0]
我可以使用以下方法轻松地将此时间戳列表转换为日期时间对象:
[dt.datetime.fromtimestamp(int(i)) for i in timestamps]
从那里我可能会写出相当长的代码,其中保留第一个 day/month 并检查列表中的下一个项目是否与 day/month 相同。如果是,我会看时间,从当天获取第一个和最后一个,并将间隔 + day/month 存储在字典中。
因为我是 Python 的新手,所以我想知道在不滥用 if/else 语句的情况下使用这种编程语言执行此操作的最佳方法是什么。
提前致谢
您可以使用 collections.defaultdict
。当您在没有对大小和成员进行初始估计的情况下尝试构建集合时,它非常方便。
from collections import defaultdict
# Initialize default dict by the type list
# Accessing a member that doesn't exist introduces that entry with the deafult value for that type
# Here, when accessing a non-existant member adds an empty list to the collection
intervalsByDate = defaultdict(list)
for t in timestamps:
dt = dt.datetime.fromtimestamp(t)
myDateKey = (dt.day, dt.month, dt.year)
# If the key doesn't exist, a new empty list is added
intervalsByDate[myDateKey].append(t)
由此,intervalsByDate
现在是一个 dict
,其值是根据日历日期排序的时间戳列表。对于每个日期,您可以对时间戳进行排序并获取总间隔。迭代 defaultdict
与 dict
相同(是 dict
的子 class)。
output = {}
for date, timestamps in intervalsByDate.iteritems():
sortedIntervals = sorted(timestamps)
output[date] = sortedIntervals[-1] - sortedIntervals[0]
现在 output
是一个以毫秒为单位的时间间隔作为值的日期映射。随心所欲!
编辑
Is it normal that the keys are not ordered? I have keys from different months mixed togheter.
是的,因为 (hash)maps & dicts
are essentially unordered
How would I be able to, for example, select the first 24 days from a month and then the last
如果我的回答非常坚定,我可能会看看 this, which is an Ordered default dict.。但是,您可以将 output
的数据类型修改为不是 dict
的数据类型以满足您的需要。例如 list
并根据日期排序。
只需将 2 个日期相减即可。这将产生一个 timedelta 实例。 参见 datetime.timedelta: https://docs.python.org/2/library/datetime.html#timedelta-objects
from datetime import datetime
delta = datetime.today() - datetime(year=2015, month=01, day=01)
#Actual printed out values may change depending o when you execute this :-)
print delta.days, delta.seconds, delta.microseconds #prints 49 50817 381000
print delta.total_seconds() #prints 4284417.381 which is 49*24*3600 + 50817 + 381000/1000000
将此与行切片和压缩相结合以获得您的解决方案。一个示例解决方案是:
timestamps = [1176239419.0, 1176334733.0, 1176445137.0, 1177619954.0, 1177620812.0, 1177621082.0, 1177838576.0, 1178349385.0, 1178401697.0, 1178437886.0, 1178926650.0, 1178982127.0, 1179130340.0, 1179263733.0, 1179264930.0, 1179574273.0, 1179671730.0, 1180549056.0, 1180763342.0, 1181386289.0, 1181990860.0, 1182979573.0, 1183326862.0]
timestamps_as_dates = [datetime.fromtimestamp(int(i)) for i in timestamps]
# Make couples of each timestamp with the next one
# timestamps_as_dates[:-1] -> all your timestamps but the last one
# timestamps_as_dates[1:] -> all your timestamps but the first one
# zip them together so that first and second are one couple, then second and third, ...
intervals = zip(timestamps_as_dates[:-1],timestamps_as_dates[1:])
interval_timedeltas = [(interval[1]-interval[0]).total_seconds() for interval in intervals]
# result = [95314.0, 110404.0, 1174817.0, 858.0, 270.0, 217494.0, 510809.0, 52312.0, 36189.0, 488764.0, 55477.0, 148213.0, 133393.0, 1197.0, 309343.0, 97457.0, 877326.0, 214286.0, 622947.0, 604571.0, 988713.0, 347289.0]
这也适用于从日期中添加或减去特定时间段:
from datetime import datetime, timedelta
tomorrow = datetime.today() + timedelta(days=1)
我没有加减月或年的简单解决方案。
如果列表按照您的情况排序,那么您可以使用 itertools.groupby()
将时间戳分组为天数:
#!/usr/bin/env python
from datetime import date, timedelta
from itertools import groupby
epoch = date(1970, 1, 1)
result = {}
assert timestamps == sorted(timestamps)
for day, group in groupby(timestamps, key=lambda ts: ts // 86400):
# store the interval + day/month in a dictionary.
same_day = list(group)
assert max(same_day) == same_day[-1] and min(same_day) == same_day[0]
result[epoch + timedelta(day)] = same_day[0], same_day[-1]
print(result)
输出
{datetime.date(2007, 4, 10): (1176239419.0, 1176239419.0),
datetime.date(2007, 4, 11): (1176334733.0, 1176334733.0),
datetime.date(2007, 4, 13): (1176445137.0, 1176445137.0),
datetime.date(2007, 4, 26): (1177619954.0, 1177621082.0),
datetime.date(2007, 4, 29): (1177838576.0, 1177838576.0),
datetime.date(2007, 5, 5): (1178349385.0, 1178401697.0),
datetime.date(2007, 5, 6): (1178437886.0, 1178437886.0),
datetime.date(2007, 5, 11): (1178926650.0, 1178926650.0),
datetime.date(2007, 5, 12): (1178982127.0, 1178982127.0),
datetime.date(2007, 5, 14): (1179130340.0, 1179130340.0),
datetime.date(2007, 5, 15): (1179263733.0, 1179264930.0),
datetime.date(2007, 5, 19): (1179574273.0, 1179574273.0),
datetime.date(2007, 5, 20): (1179671730.0, 1179671730.0),
datetime.date(2007, 5, 30): (1180549056.0, 1180549056.0),
datetime.date(2007, 6, 2): (1180763342.0, 1180763342.0),
datetime.date(2007, 6, 9): (1181386289.0, 1181386289.0),
datetime.date(2007, 6, 16): (1181990860.0, 1181990860.0),
datetime.date(2007, 6, 27): (1182979573.0, 1182979573.0),
datetime.date(2007, 7, 1): (1183326862.0, 1183326862.0)}
如果当天只有一个时间戳,则重复两次。
how would you afterwards do to test if the last (for example) 5 entries in the result have a larger interval than the previous 14?
entries = sorted(result.items())
intervals = [(end - start) for _, (start, end) in entries]
print(max(intervals[-5:]) > max(intervals[-5-14:-5]))
# -> False