每分钟分组时间戳
Group timestamp every minute
我在 python 中编写有关数据帧的代码。我成功地将时间戳与秒分组,如下所示,但我不知道如何将时间戳与分钟分组。
Price
timestamp
2018-06-01 00:00:00.155449 13530.909091
2018-06-01 00:00:01.155449 13530.909091
2018-06-01 00:00:02.155451 13530.909091
2018-06-01 00:00:03.155452 13530.909091
2018-06-01 00:00:04.155453 13530.909091
... ...
2018-06-01 23:59:55.735402 13285.000000
2018-06-01 23:59:56.894110 13285.000000
2018-06-01 23:59:57.894110 13285.000000
2018-06-01 23:59:58.894110 13285.000000
2018-06-01 23:59:59.894110 13285.000000
我用这样的groupby方法sell_price = sell.groupby('timestamp').price.mean()
如何每分钟对这个时间戳进行分组?
我的预期结果:
timestamp price
2018-06-01 00:01:00.155449 13530.909091
2018-06-01 00:02:00.155449 13530.909091
2018-06-01 00:03:00.155451 13530.909091
2018-06-01 00:04:00.155452 13530.909091
2018-06-01 00:05:00.155453 13530.909091
... ...
2018-06-01 23:55:00.735402 13285.000000
2018-06-01 23:56:00.894110 13285.000000
2018-06-01 23:57:00.894110 13285.000000
2018-06-01 23:58:00.894110 13285.000000
2018-06-01 23:59:00.894110 13285.000000
Pandas有一些处理时间序列的函数,以时间戳为索引时更容易实现。
您要做的是对数据进行下采样。
Downsampling is to resample a time-series dataset to a wider time frame. For example, from minutes to hours, from days to years. The result will have a reduced number of rows and values can be aggregated with mean(), min(), max(), sum() etc.
Pandas resample() tricks you should know for manipulating time-series data
这是执行任务的代码。
import pandas as pd
import numpy as np
# Set a random seed to allow reproducibility
np.random.seed = 42
# Create array with timestamps
dates = pd.date_range(start="2018-06-01 00:00:00",
end="2018-06-01 23:59:59",
freq="s")
# Create array with random prices
prices = np.random.uniform(low=13285, high=13530.9, size=len(dates))
# Create the DataFrame
df = pd.DataFrame(data=prices, index=dates)
# Resample to every minute
df.resample("1T").mean()
这是输出:
2018-06-01 00:00:00 13421.290908
2018-06-01 00:01:00 13414.707903
2018-06-01 00:02:00 13394.962477
2018-06-01 00:03:00 13413.036905
2018-06-01 00:04:00 13412.717874
...
2018-06-01 23:55:00 13412.137577
2018-06-01 23:56:00 13409.450838
2018-06-01 23:57:00 13411.499249
2018-06-01 23:58:00 13398.442782
2018-06-01 23:59:00 13412.034963
[1440 rows x 1 columns]
原始数据包含 86400 行,而下采样数据仅包含 1440。
我在 python 中编写有关数据帧的代码。我成功地将时间戳与秒分组,如下所示,但我不知道如何将时间戳与分钟分组。
Price
timestamp
2018-06-01 00:00:00.155449 13530.909091
2018-06-01 00:00:01.155449 13530.909091
2018-06-01 00:00:02.155451 13530.909091
2018-06-01 00:00:03.155452 13530.909091
2018-06-01 00:00:04.155453 13530.909091
... ...
2018-06-01 23:59:55.735402 13285.000000
2018-06-01 23:59:56.894110 13285.000000
2018-06-01 23:59:57.894110 13285.000000
2018-06-01 23:59:58.894110 13285.000000
2018-06-01 23:59:59.894110 13285.000000
我用这样的groupby方法sell_price = sell.groupby('timestamp').price.mean()
如何每分钟对这个时间戳进行分组?
我的预期结果:
timestamp price
2018-06-01 00:01:00.155449 13530.909091
2018-06-01 00:02:00.155449 13530.909091
2018-06-01 00:03:00.155451 13530.909091
2018-06-01 00:04:00.155452 13530.909091
2018-06-01 00:05:00.155453 13530.909091
... ...
2018-06-01 23:55:00.735402 13285.000000
2018-06-01 23:56:00.894110 13285.000000
2018-06-01 23:57:00.894110 13285.000000
2018-06-01 23:58:00.894110 13285.000000
2018-06-01 23:59:00.894110 13285.000000
Pandas有一些处理时间序列的函数,以时间戳为索引时更容易实现。 您要做的是对数据进行下采样。
Downsampling is to resample a time-series dataset to a wider time frame. For example, from minutes to hours, from days to years. The result will have a reduced number of rows and values can be aggregated with mean(), min(), max(), sum() etc. Pandas resample() tricks you should know for manipulating time-series data
这是执行任务的代码。
import pandas as pd
import numpy as np
# Set a random seed to allow reproducibility
np.random.seed = 42
# Create array with timestamps
dates = pd.date_range(start="2018-06-01 00:00:00",
end="2018-06-01 23:59:59",
freq="s")
# Create array with random prices
prices = np.random.uniform(low=13285, high=13530.9, size=len(dates))
# Create the DataFrame
df = pd.DataFrame(data=prices, index=dates)
# Resample to every minute
df.resample("1T").mean()
这是输出:
2018-06-01 00:00:00 13421.290908
2018-06-01 00:01:00 13414.707903
2018-06-01 00:02:00 13394.962477
2018-06-01 00:03:00 13413.036905
2018-06-01 00:04:00 13412.717874
...
2018-06-01 23:55:00 13412.137577
2018-06-01 23:56:00 13409.450838
2018-06-01 23:57:00 13411.499249
2018-06-01 23:58:00 13398.442782
2018-06-01 23:59:00 13412.034963
[1440 rows x 1 columns]
原始数据包含 86400 行,而下采样数据仅包含 1440。