每分钟分组时间戳

Question

我在 python 中编写有关数据帧的代码。我成功地将时间戳与秒分组，如下所示，但我不知道如何将时间戳与分钟分组。

                          Price
timestamp   
2018-06-01 00:00:00.155449  13530.909091
2018-06-01 00:00:01.155449  13530.909091
2018-06-01 00:00:02.155451  13530.909091
2018-06-01 00:00:03.155452  13530.909091
2018-06-01 00:00:04.155453  13530.909091
... ...
2018-06-01 23:59:55.735402  13285.000000
2018-06-01 23:59:56.894110  13285.000000
2018-06-01 23:59:57.894110  13285.000000
2018-06-01 23:59:58.894110  13285.000000
2018-06-01 23:59:59.894110  13285.000000

我用这样的groupby方法sell_price = sell.groupby('timestamp').price.mean()

如何每分钟对这个时间戳进行分组？

我的预期结果：

timestamp                      price
    2018-06-01 00:01:00.155449  13530.909091
    2018-06-01 00:02:00.155449  13530.909091
    2018-06-01 00:03:00.155451  13530.909091
    2018-06-01 00:04:00.155452  13530.909091
    2018-06-01 00:05:00.155453  13530.909091
    ... ...
    2018-06-01 23:55:00.735402  13285.000000
    2018-06-01 23:56:00.894110  13285.000000
    2018-06-01 23:57:00.894110  13285.000000
    2018-06-01 23:58:00.894110  13285.000000
    2018-06-01 23:59:00.894110  13285.000000

Answer 1

Pandas有一些处理时间序列的函数，以时间戳为索引时更容易实现。您要做的是对数据进行下采样。

Downsampling is to resample a time-series dataset to a wider time frame. For example, from minutes to hours, from days to years. The result will have a reduced number of rows and values can be aggregated with mean(), min(), max(), sum() etc. Pandas resample() tricks you should know for manipulating time-series data

这是执行任务的代码。

import pandas as pd
import numpy as np

# Set a random seed to allow reproducibility
np.random.seed = 42

# Create array with timestamps
dates = pd.date_range(start="2018-06-01 00:00:00",
                      end="2018-06-01 23:59:59",
                      freq="s")

# Create array with random prices
prices = np.random.uniform(low=13285, high=13530.9, size=len(dates))

# Create the DataFrame
df = pd.DataFrame(data=prices, index=dates)

# Resample to every minute
df.resample("1T").mean()

这是输出：

2018-06-01 00:00:00  13421.290908
2018-06-01 00:01:00  13414.707903
2018-06-01 00:02:00  13394.962477
2018-06-01 00:03:00  13413.036905
2018-06-01 00:04:00  13412.717874
                          ...
2018-06-01 23:55:00  13412.137577
2018-06-01 23:56:00  13409.450838
2018-06-01 23:57:00  13411.499249
2018-06-01 23:58:00  13398.442782
2018-06-01 23:59:00  13412.034963

[1440 rows x 1 columns]

原始数据包含 86400 行，而下采样数据仅包含 1440。

每分钟分组时间戳

Group timestamp every minute

python

timestamp

dataframe

pandas

pandas-groupby