每小时桶的最小和最大时间戳

Minimum and maximum timestamps per hour bucket

我有一个带有时间戳的文本文件。

示例:

16-07-2015 18:08:20
16-07-2015 18:08:22
16-07-2015 18:08:30
16-07-2015 18:08:40
17-07-2015 10:04:01
17-07-2015 10:14:31
17-07-2015 10:14:59
17-07-2015 12:24:11
....

现在我需要每小时的最小值和最大值,如下例所示。

示例:

16-07-2015 18:08:20 - 16-07-2015 18:08:40
17-07-2015 10:04:01 - 17-07-2015 10:14:59
17-07-2015 12:24:11 - ....

我怎样才能做到这一点?

如果你有一个可迭代的 datetime 对象,你可以按天和小时对它们进行分组,然后使用 itertools.groupby():

找到第一个和最后一个
from itertools import groupby

def min_max_per_hour(iterable):
    for dayhour, grouped in groupby(iterable, lambda dt: (dt.date(), dt.hour)):
        minimum = next(grouped)  # first object is the minimum for this hour
        maximum = minimum  # starting value
        for dt in grouped:
            maximum = dt   # last assignment is the maximum within this hour
        yield (minimum, maximum)

这依赖于包含datetime对象的可迭代对象排序

要生成可迭代的输入,请在生成器表达式或另一个生成器中解析文本文件;没有必要一次把所有的东西都保存在内存中:

from datetime import datetime

with open(input_filename) as inf:
    # generator expression
    datetimes = (datetime.strptime(line.strip(), '%d-%m-%Y %H:%M:%S') for line in inf)
    for mindt, maxdt in min_max_per_hour(datetimes):
        print mindt, maxdt

演示:

>>> from datetime import datetime
>>> from itertools import groupby
>>> def min_max_per_hour(iterable):
...     for dayhour, grouped in groupby(iterable, lambda dt: (dt.date(), dt.hour)):
...         minimum = next(grouped)  # first object is the minimum for this hour
...         maximum = minimum  # starting value
...         for dt in grouped:
...             maximum = dt   # last assignment is the maximum within this hour
...         yield (minimum, maximum)
...
>>> textfile = '''\
... 16-07-2015 18:08:20
... 16-07-2015 18:08:22
... 16-07-2015 18:08:30
... 16-07-2015 18:08:40
... 17-07-2015 10:04:01
... 17-07-2015 10:14:31
... 17-07-2015 10:14:59
... 17-07-2015 12:24:11
... '''.splitlines()
>>> datetimes = (datetime.strptime(line.strip(), '%d-%m-%Y %H:%M:%S') for line in textfile)
>>> for mindt, maxdt in min_max_per_hour(datetimes):
...     print mindt, maxdt
...
2015-07-16 18:08:20 2015-07-16 18:08:40
2015-07-17 10:04:01 2015-07-17 10:14:59
2015-07-17 12:24:11 2015-07-17 12:24:11