Pandas - 获取两个日期之间的所有行,但仅限于特定的工作日和时间段
Pandas - Get all rows between two dates, but only specific weekdays, and time periods
假设我有一个如下所示的数据框:
usage_price
2017-04-01 00:00:00 1
2017-04-01 00:30:00 1
2017-04-01 01:00:00 1
2017-04-01 01:30:00 1
2017-04-01 02:00:00 1
... ...
2018-12-31 22:00:00 1
2018-12-31 22:30:00 1
2018-12-31 23:00:00 1
2018-12-31 23:30:00 1
我想做的是更新特定字段的 usage_price
。就我而言,我想基于此对象进行更新:
{'day': '1', 'timerange': ['01 01 00:00', '31 12 08:00']}
即:
- 更新所有星期一 ('day': '1')
- 在 00:00 和 08:00 之间
- 对于 01-01(1 月 1 日)和 31-12(12 月 31 日)之间的任何星期一(忽略年份)
我知道如何分别完成所有这些事情:
df_timeseries[df_timeseries.index.weekday==0, 'usage_price]
df_timeseries.loc[df_timeseries.between_time('00:00', '08:00', include_end=False).index,'usage_price']
但我对如何获取日期之间的行(忽略年份)以及如何将所有内容组合在一起有点困惑 - 非常感谢您的帮助!
编辑: 这是我已经达到的程度,但我似乎无法让它工作(我遇到语法错误),我不知道我不认为我会以正确的方式加法构建面具:
def _create_mask_from_tpr(self, df: pd.DataFrame, tpr: Dict[str, Union[str, List[str]]]) -> Tuple:
print(tpr)
weekday = int(tpr['day']) - 1 # Offset.
start_day, start_month, start_time = tpr['timerange'][0].split(" ")
end_day, end_month, end_time = tpr['timerange'][1].split(" ")
start_year, end_year = df.index.min().year, df.index.max().year
selection_weekday = (df.index.weekday == weekday)
selection_time = (df.between_time(start_time, end_time))
selection_date = None
for year in range(start_year, end_year + 1):
start_date = pd.to_datetime("{}-{}-{}".format(year, start_month, start_day))
end_date = pd.to_datetime("{}-{}-{}".format(year, end_month, end_day))
selection_date = selection_date | (df.index.date() >= start_date & df.index.date() <= end_date)
mask = (selection_weekday & selection_time & selection_date)
print(mask)
未经测试,但以下几行可能有效:
selection = ((df_timeseries.index.weekday == 0) &
(df_timeseries.between_time('00:00', '08:00', include_end=False)))
result = df_timeseries[selection, 'usage_price']
通常,您可以将比较与 |
或 &
运算符结合使用(但要使用括号)。
由于开始日期和结束日期包括全年,因此我没有对此进行过滤。
如果你想select日期,没有指定一年,你会运行在做的时候遇到问题例如:可能有做如下事情:
selection = ((df_timeseries.index.day >= 5) &
(df_timeseries.index.day <= 20) &
(df_timeseries.index.day >= 2) &
(df_timeseries.index.day <= 3))
因为您现在会错过 2 月底(天数 > 20)和 3 月初(天数 < 3)。
改用 df_timeseries.index.dayofyear
可以工作,除了 在闰年:你会错过日期跨度结束时的一天。
我不知道在忽略年份的情况下过滤日期范围的简单方法。您可能必须在感兴趣的年份中创建一个循环,并比较每年的完整年-月-日范围,将每个子 selection 与 |
组合。这也是使用 |
和 &
:
链接更复杂的 select 离子的另一个例子
start = '02-05'
end = '03-02'
subsel = np.zeros(len(df), dtype=np.bool) # include no dates by default
years = np.range(2018, 2050, dtype=np.int)
for year in years:
startdate = (pd.to_datetime(str(year) + '-' + start)).date()
enddate = (pd.to_datetime(str(year) + '-' + end)).date()
subsel = subsel | (df.index.date >= startdate & df.index.date <= enddate)
selection = selection & subsel
最终解决方案:
def _create_mask_from_tpr(self, df: pd.DataFrame, tpr: Dict[str, Union[str, List[str]]]) -> List[bool]:
weekday = int(tpr['day']) - 1 # Offset.
start_day, start_month, start_time = tpr['timerange'][0].split(" ")
end_day, end_month, end_time = tpr['timerange'][1].split(" ")
start_year, end_year = df.index.min().year, df.index.max().year
selection_weekday = (df.index.weekday == weekday)
start_time = datetime.datetime.strptime(start_time, '%H:%M').time()
end_time = datetime.datetime.strptime(end_time, '%H:%M').time()
selection_time = ((df.index.time >= start_time) & (df.index.time <= end_time))
selection_date = None
for year in range(start_year, end_year + 1):
start_date = pd.Timestamp("{}-{}-{}".format(year, start_month, start_day))
end_date = pd.Timestamp("{}-{}-{}".format(year, end_month, end_day))
if selection_date:
selection_date = selection_date | ((df.index >= start_date) & (df.index <= end_date))
else:
selection_date = ((df.index >= start_date) & (df.index <= end_date))
return (selection_weekday & selection_time & selection_date)
假设我有一个如下所示的数据框:
usage_price
2017-04-01 00:00:00 1
2017-04-01 00:30:00 1
2017-04-01 01:00:00 1
2017-04-01 01:30:00 1
2017-04-01 02:00:00 1
... ...
2018-12-31 22:00:00 1
2018-12-31 22:30:00 1
2018-12-31 23:00:00 1
2018-12-31 23:30:00 1
我想做的是更新特定字段的 usage_price
。就我而言,我想基于此对象进行更新:
{'day': '1', 'timerange': ['01 01 00:00', '31 12 08:00']}
即:
- 更新所有星期一 ('day': '1')
- 在 00:00 和 08:00 之间
- 对于 01-01(1 月 1 日)和 31-12(12 月 31 日)之间的任何星期一(忽略年份)
我知道如何分别完成所有这些事情:
df_timeseries[df_timeseries.index.weekday==0, 'usage_price]
df_timeseries.loc[df_timeseries.between_time('00:00', '08:00', include_end=False).index,'usage_price']
但我对如何获取日期之间的行(忽略年份)以及如何将所有内容组合在一起有点困惑 - 非常感谢您的帮助!
编辑: 这是我已经达到的程度,但我似乎无法让它工作(我遇到语法错误),我不知道我不认为我会以正确的方式加法构建面具:
def _create_mask_from_tpr(self, df: pd.DataFrame, tpr: Dict[str, Union[str, List[str]]]) -> Tuple:
print(tpr)
weekday = int(tpr['day']) - 1 # Offset.
start_day, start_month, start_time = tpr['timerange'][0].split(" ")
end_day, end_month, end_time = tpr['timerange'][1].split(" ")
start_year, end_year = df.index.min().year, df.index.max().year
selection_weekday = (df.index.weekday == weekday)
selection_time = (df.between_time(start_time, end_time))
selection_date = None
for year in range(start_year, end_year + 1):
start_date = pd.to_datetime("{}-{}-{}".format(year, start_month, start_day))
end_date = pd.to_datetime("{}-{}-{}".format(year, end_month, end_day))
selection_date = selection_date | (df.index.date() >= start_date & df.index.date() <= end_date)
mask = (selection_weekday & selection_time & selection_date)
print(mask)
未经测试,但以下几行可能有效:
selection = ((df_timeseries.index.weekday == 0) &
(df_timeseries.between_time('00:00', '08:00', include_end=False)))
result = df_timeseries[selection, 'usage_price']
通常,您可以将比较与 |
或 &
运算符结合使用(但要使用括号)。
由于开始日期和结束日期包括全年,因此我没有对此进行过滤。
如果你想select日期,没有指定一年,你会运行在做的时候遇到问题例如:可能有做如下事情:
selection = ((df_timeseries.index.day >= 5) &
(df_timeseries.index.day <= 20) &
(df_timeseries.index.day >= 2) &
(df_timeseries.index.day <= 3))
因为您现在会错过 2 月底(天数 > 20)和 3 月初(天数 < 3)。
改用 df_timeseries.index.dayofyear
可以工作,除了 在闰年:你会错过日期跨度结束时的一天。
我不知道在忽略年份的情况下过滤日期范围的简单方法。您可能必须在感兴趣的年份中创建一个循环,并比较每年的完整年-月-日范围,将每个子 selection 与 |
组合。这也是使用 |
和 &
:
start = '02-05'
end = '03-02'
subsel = np.zeros(len(df), dtype=np.bool) # include no dates by default
years = np.range(2018, 2050, dtype=np.int)
for year in years:
startdate = (pd.to_datetime(str(year) + '-' + start)).date()
enddate = (pd.to_datetime(str(year) + '-' + end)).date()
subsel = subsel | (df.index.date >= startdate & df.index.date <= enddate)
selection = selection & subsel
最终解决方案:
def _create_mask_from_tpr(self, df: pd.DataFrame, tpr: Dict[str, Union[str, List[str]]]) -> List[bool]:
weekday = int(tpr['day']) - 1 # Offset.
start_day, start_month, start_time = tpr['timerange'][0].split(" ")
end_day, end_month, end_time = tpr['timerange'][1].split(" ")
start_year, end_year = df.index.min().year, df.index.max().year
selection_weekday = (df.index.weekday == weekday)
start_time = datetime.datetime.strptime(start_time, '%H:%M').time()
end_time = datetime.datetime.strptime(end_time, '%H:%M').time()
selection_time = ((df.index.time >= start_time) & (df.index.time <= end_time))
selection_date = None
for year in range(start_year, end_year + 1):
start_date = pd.Timestamp("{}-{}-{}".format(year, start_month, start_day))
end_date = pd.Timestamp("{}-{}-{}".format(year, end_month, end_day))
if selection_date:
selection_date = selection_date | ((df.index >= start_date) & (df.index <= end_date))
else:
selection_date = ((df.index >= start_date) & (df.index <= end_date))
return (selection_weekday & selection_time & selection_date)