Pandas - 获取两个日期之间的所有行,但仅限于特定的工作日和时间段

Pandas - Get all rows between two dates, but only specific weekdays, and time periods

假设我有一个如下所示的数据框:

                     usage_price
2017-04-01 00:00:00            1
2017-04-01 00:30:00            1
2017-04-01 01:00:00            1
2017-04-01 01:30:00            1
2017-04-01 02:00:00            1
...                          ...
2018-12-31 22:00:00            1
2018-12-31 22:30:00            1
2018-12-31 23:00:00            1
2018-12-31 23:30:00            1

我想做的是更新特定字段的 usage_price。就我而言,我想基于此对象进行更新:

{'day': '1', 'timerange': ['01 01 00:00', '31 12 08:00']}

即:

我知道如何分别完成所有这些事情:

但我对如何获取日期之间的行(忽略年份)以及如何将所有内容组合在一起有点困惑 - 非常感谢您的帮助!

编辑: 这是我已经达到的程度,但我似乎无法让它工作(我遇到语法错误),我不知道我不认为我会以正确的方式加法构建面具:

def _create_mask_from_tpr(self, df: pd.DataFrame, tpr: Dict[str, Union[str, List[str]]]) -> Tuple:
    print(tpr)
    weekday = int(tpr['day']) - 1 # Offset.
    start_day, start_month, start_time = tpr['timerange'][0].split(" ")
    end_day, end_month, end_time = tpr['timerange'][1].split(" ")
    start_year, end_year = df.index.min().year, df.index.max().year
    selection_weekday = (df.index.weekday == weekday)
    selection_time = (df.between_time(start_time, end_time))

    selection_date = None
    for year in range(start_year, end_year + 1):
        start_date = pd.to_datetime("{}-{}-{}".format(year, start_month, start_day))
        end_date = pd.to_datetime("{}-{}-{}".format(year, end_month, end_day))
        selection_date = selection_date | (df.index.date() >= start_date & df.index.date() <= end_date)
    mask = (selection_weekday & selection_time & selection_date)
    print(mask)

未经测试,但以下几行可能有效:

selection = ((df_timeseries.index.weekday == 0) & 
             (df_timeseries.between_time('00:00', '08:00', include_end=False)))
result = df_timeseries[selection, 'usage_price']

通常,您可以将比较与 |& 运算符结合使用(但要使用括号)。 由于开始日期和结束日期包括全年,因此我没有对此进行过滤。

如果你想select日期,没有指定一年,你会运行在做的时候遇到问题例如:可能有做如下事情:

selection = ((df_timeseries.index.day >= 5) &
             (df_timeseries.index.day <= 20) &
             (df_timeseries.index.day >= 2) &
             (df_timeseries.index.day <= 3))

因为您现在会错过 2 月底(天数 > 20)和 3 月初(天数 < 3)。

改用 df_timeseries.index.dayofyear 可以工作,除了 在闰年:你会错过日期跨度结束时的一天。

我不知道在忽略年份的情况下过滤日期范围的简单方法。您可能必须在感兴趣的年份中创建一个循环,并比较每年的完整年-月-日范围,将每个子 selection 与 | 组合。这也是使用 |&:

链接更复杂的 select 离子的另一个例子
start = '02-05'
end = '03-02'
subsel = np.zeros(len(df), dtype=np.bool)  # include no dates by default
years = np.range(2018, 2050, dtype=np.int)
for year in years:
    startdate = (pd.to_datetime(str(year) + '-' + start)).date()
    enddate = (pd.to_datetime(str(year) + '-' + end)).date()
    subsel = subsel | (df.index.date >= startdate & df.index.date <= enddate)
selection = selection & subsel

最终解决方案:

def _create_mask_from_tpr(self, df: pd.DataFrame, tpr: Dict[str, Union[str, List[str]]]) -> List[bool]:
    weekday = int(tpr['day']) - 1 # Offset.
    start_day, start_month, start_time = tpr['timerange'][0].split(" ")
    end_day, end_month, end_time = tpr['timerange'][1].split(" ")
    start_year, end_year = df.index.min().year, df.index.max().year
    selection_weekday = (df.index.weekday == weekday)

    start_time = datetime.datetime.strptime(start_time, '%H:%M').time()
    end_time = datetime.datetime.strptime(end_time, '%H:%M').time()
    selection_time = ((df.index.time >= start_time) & (df.index.time <= end_time))

    selection_date = None
    for year in range(start_year, end_year + 1):
        start_date = pd.Timestamp("{}-{}-{}".format(year, start_month, start_day))
        end_date = pd.Timestamp("{}-{}-{}".format(year, end_month, end_day))
        if selection_date:
            selection_date = selection_date | ((df.index >= start_date) & (df.index <= end_date))
        else:
            selection_date = ((df.index >= start_date) & (df.index <= end_date))
    return (selection_weekday & selection_time & selection_date)