Pandas 对具有开始和结束时间的行进行上采样

Pandas upsample rows with a start and end time

我有以下形式的数据框:

In [5]: df = pd.DataFrame({
   ...:     'start_time': ['2022-01-01 01:15', '2022-01-01 13:00'],
   ...:     'end_time': ['2022-01-01 03:45', '2022-01-01 15:00'],
   ...:     'values': [1000, 750]})

In [6]: df
Out[6]: 
         start_time          end_time  values
0  2022-01-01 01:15  2022-01-01 03:45    1000
1  2022-01-01 13:00  2022-01-01 15:00     750

我想将其转换为 24 小时值,在 start_time/end_time 范围内的小时内按比例拆分值。对于上面的例子,这应该产生:

In [10]: result
Out[10]: 
                     value
2022-01-01 00:00:00      0
2022-01-01 01:00:00    300
2022-01-01 02:00:00    400
2022-01-01 03:00:00    300
2022-01-01 04:00:00      0
2022-01-01 05:00:00      0
2022-01-01 06:00:00      0
2022-01-01 07:00:00      0
2022-01-01 08:00:00      0
2022-01-01 09:00:00      0
2022-01-01 10:00:00      0
2022-01-01 11:00:00      0
2022-01-01 12:00:00      0
2022-01-01 13:00:00    375
2022-01-01 14:00:00    375
2022-01-01 15:00:00      0
2022-01-01 16:00:00      0
2022-01-01 17:00:00      0
2022-01-01 18:00:00      0
2022-01-01 19:00:00      0
2022-01-01 20:00:00      0
2022-01-01 21:00:00      0
2022-01-01 22:00:00      0
2022-01-01 23:00:00      0

start_time/end_time 范围不重叠。关于如何实现这一点有什么建议吗?

使用:

#get differencies between start and end in minutes
df['diff'] = pd.to_datetime(df['end_time']).sub(pd.to_datetime(df['start_time'])).dt.total_seconds().div(60)

#create DataFrame with repeat values by minutes
s = pd.concat([pd.Series(r.Index,pd.date_range(r.start_time, r.end_time, freq='Min', closed='left')) for r in df.itertuples()])
s = pd.Series(s.index, s.to_numpy(), name='new')
df = df.join(s)

#resample to hours
df = df.resample('H', on='new').agg({'values':'first', 'diff':'first', 'new':'size'})
#multiple values by ratio
df['value'] = df['values'].mul(df['new'].div(df['diff'])).fillna(0)

#add missing rows
r = pd.date_range(df.index.min().normalize(), df.index.max().normalize() + pd.Timedelta('23H'), freq='H')
df = df[['value']].reindex(r, fill_value=0)

print (df)
                     value
2022-01-01 00:00:00    0.0
2022-01-01 01:00:00  300.0
2022-01-01 02:00:00  400.0
2022-01-01 03:00:00  300.0
2022-01-01 04:00:00    0.0
2022-01-01 05:00:00    0.0
2022-01-01 06:00:00    0.0
2022-01-01 07:00:00    0.0
2022-01-01 08:00:00    0.0
2022-01-01 09:00:00    0.0
2022-01-01 10:00:00    0.0
2022-01-01 11:00:00    0.0
2022-01-01 12:00:00    0.0
2022-01-01 13:00:00  375.0
2022-01-01 14:00:00  375.0
2022-01-01 15:00:00    0.0
2022-01-01 16:00:00    0.0
2022-01-01 17:00:00    0.0
2022-01-01 18:00:00    0.0
2022-01-01 19:00:00    0.0
2022-01-01 20:00:00    0.0
2022-01-01 21:00:00    0.0
2022-01-01 22:00:00    0.0
2022-01-01 23:00:00    0.0