Pandas、datetime & logic:汇总一列中特定行的总和

Pandas, datetime & logic: Summarizing the sum of specific rows in a column

我得到了一个 DataFrame:

    date        start       end         inter
0   01-09-2020  10:01:24    10:02:13    0 days 00:00:49
1   01-09-2020  10:04:21    10:22:01    0 days 00:17:40
2   01-09-2020  10:16:14    10:25:06    0 days 00:08:52
3   01-09-2020  10:28:38    10:28:40    0 days 00:00:02
4   01-09-2020  10:37:38    10:37:41    0 days 00:00:03
... ... ... ... ...
995 17-09-2020  12:19:03    12:21:06    0 days 00:02:03
996 17-09-2020  12:22:53    12:22:58    0 days 00:00:05
997 17-09-2020  12:25:11    12:25:14    0 days 00:00:03
998 17-09-2020  12:27:07    12:27:08    0 days 00:00:01
999 17-09-2020  12:29:03    12:29:05    0 days 00:00:02
1000 rows × 4 columns

我想创建一个新的 df,但是 'inter' 的总和在特定的日期时间范围内。例如:

    new_date    start_range end_range   inter_sum
0   01-09-2020  10:00:00    10:59:59    0 days 01:15:36 
1   01-09-2020  11:00:00    11:59:59    0 days 00:58:30
...
997 17-09-2020  10:00:00    10:59:59    0 days 03:00:15
998 17-09-2020  11:00:00    11:59:59    0 days 00:47:20

其中 'inter_sum' 是 'start_range' 和 'end_range' 之间的 'inter' 值的总和,基于之前 df 的 'start' 和 'end' .

试试 resample:

#convert date column to datetime if needed
df["date"] = pd.to_datetime(df["date"])

#convert time column to datetime for resample
df["start"] = pd.to_datetime(df["date"].astype(str)+"T"+df["start"].astype(str))

#resample on start datetime every hour and sum
output = df.resample("H", on="start")["inter"].sum()

#formatting to match expected output
output["date"] = output["start"].dt.date
output["end"] = (output["start"] + pd.Timedelta(minutes=59, seconds=59)).dt.time
output["start"] = output["start"].dt.time
output = output[["date", "start", "end", "inter_sum"]]

>>> output
            date     start       end       inter_sum
0     2020-01-09  10:00:00  10:59:59 0 days 00:27:26
1     2020-01-09  11:00:00  11:59:59 0 days 00:00:00
2     2020-01-09  12:00:00  12:59:59 0 days 00:00:00
3     2020-01-09  13:00:00  13:59:59 0 days 00:00:00
4     2020-01-09  14:00:00  14:59:59 0 days 00:00:00
         ...       ...       ...             ...
6046  2020-09-17  08:00:00  08:59:59 0 days 00:00:00
6047  2020-09-17  09:00:00  09:59:59 0 days 00:00:00
6048  2020-09-17  10:00:00  10:59:59 0 days 00:00:00
6049  2020-09-17  11:00:00  11:59:59 0 days 00:00:00
6050  2020-09-17  12:00:00  12:59:59 0 days 00:02:14