聚合和分发时间序列数据

Question

我在 pandas 数据框中有一些时间序列数据，如下所示：

begin	end	mw_values
2021-09-14 11:16:00	2021-09-14 11:27:11	0
2021-09-14 11:27:11	2021-09-14 11:30:00	100
2021-09-14 11:30:00	2021-09-14 11:33:59	1200
2021-09-14 11:33:59	2021-09-14 11:39:42	600
2021-09-14 11:39:42	2021-09-14 11:59:59	400

我需要将 mw_values 的总和分配到 15 分钟的时间段中，如下所示：

time_slots_15_min	sum_mw_values
2021-09-14 11:00	0
2021-09-14 11:15	100
2021-09-14 11:30	2200
2021-09-14 11:45	0
2021-09-14 12:00	0

有人知道我该怎么做吗？

请注意，开始和结束之间的间隔可能会重叠 2 个时隙。则该值必须包含在其开始的时隙的总和中；例如上面示例中的 mw_value 400。

Answer 1

您可以通过 begin 列重新索引您的 DataFrame，插入两个新行以确保开始时间从 11:00 开始，结束时间是 12:00)，然后然后使用 .resample("15min").sum() 这将适用于 DatetimeIndex （如果您想进一步阅读，可以找到文档 here）：

## in case your column isn't already a datetime
df["begin"] = pd.to_datetime(df["begin"])

df = df.set_index("begin")

## add beginning and ending times to df
df_start_end = pd.DataFrame({"end": ["2021-09-14 11:15:00","2021-09-14 12:15:00"], "mw_values":[0]}, index=[pd.to_datetime("2021-09-14 11:00:00"),pd.to_datetime("2021-09-14 12:00:00")])
df_final = pd.concat([df_start_end,df]).sort_index()

这就是 df_final 的样子：

                                     end  mw_values
2021-09-14 11:00:00  2021-09-14 11:15:00          0
2021-09-14 11:16:00  2021-09-14 11:27:11          0
2021-09-14 11:27:11  2021-09-14 11:30:00        100
2021-09-14 11:30:00  2021-09-14 11:33:59       1200
2021-09-14 11:33:59  2021-09-14 11:39:42        600
2021-09-14 11:39:42  2021-09-14 11:59:59        400
2021-09-14 12:00:00  2021-09-14 12:15:00          0

然后我们在 DatetimeIndex:

每 15 分钟重新采样并求和

## sum by every 15 minutes from the start to end time
df_final.resample("15min").sum().reset_index().rename(columns={"index":"time_slots_15_min","mw_values":"sum_mw_values"})

输出：

    time_slots_15_min  sum_mw_values
0 2021-09-14 11:00:00              0
1 2021-09-14 11:15:00            100
2 2021-09-14 11:30:00           2200
3 2021-09-14 11:45:00              0
4 2021-09-14 12:00:00              0

Answer 2

您可以对数据框重新采样，以便在 15 分钟的 bin 中对数据求和。然后您可以重新索引该帧，使其匹配您想要的 start/end/frequency 次。

freq = "15min"
new_index = pd.date_range(
    "2021-09-14 11:00:00", "2021-09-14 12:00:00", freq=freq
)

out = (
    df.resample(freq, on="begin")["mw_values"]
    .sum()
    .reindex(new_index, fill_value=0)
    .to_frame("sum_mw_values")
)

print(out)
                     sum_mw_values
2021-09-14 11:00:00              0
2021-09-14 11:15:00            100
2021-09-14 11:30:00           2200
2021-09-14 11:45:00              0
2021-09-14 12:00:00              0

聚合和分发时间序列数据

aggregate and distribute time series data

python

time-series

pandas

rolling-computation

pandas-resample