Pandas:计算 2 列之间的时间(以分钟为单位),不包括周末、public 节假日并考虑营业时间
Pandas: Calculate time in minutes between 2 columns, excluding weekends, public holidays and taking business hours into account
我有以下问题,我觉得我离解决它只有几步之遥,但我还没有足够的经验。为此,我使用了 business-duration。
我查看了其他类似的答案并尝试了很多方法,但这是我得到的最接近的答案 (Using this answer)。我正在使用 Anaconda 和 Spyder,这是目前我在工作笔记本电脑上使用的唯一方法。我无法将一些自定义工作日功能安装到 anaconda 中。
我有一个大型数据集(约 20 万行),我需要解决此问题:
import pandas as pd
import business_duration as bd
import datetime as dt
import holidays as pyholidays
#Specify Business Working hours (8am - 5pm)
Bus_start_time = dt.time(8,00,0)
Bus_end_time = dt.time(17,0,0)
holidaylist = pyholidays.ZA()
unit='min'
list = [[10, '2022-01-01 07:00:00', '2022-01-08 15:00:00'], [11, '2022-01-02 18:00:00', '2022-01-10 15:30:00'],
[12, '2022-01-01 09:15:00', '2022-01-08 12:00:00'], [13, '2022-01-07 13:00:00', '2022-01-23 17:00:00']]
df = pd.DataFrame(list, columns =['ID', 'Start', 'End'])
print(df)
给出:
ID Start End
0 10 2022-01-01 07:00:00 2022-01-08 15:00:00
1 11 2022-01-02 18:00:00 2022-01-10 15:30:00
2 12 2022-01-01 09:15:00 2022-01-08 12:00:00
3 13 2022-01-07 13:00:00 2022-01-23 17:00:00
下一步用于测试单个日期:
startdate = pd.to_datetime('2022-01-01 00:00:00')
enddate = pd.to_datetime('2022-01-14 23:00:00')
df['TimeAdj'] = bd.businessDuration(startdate,enddate,Bus_start_time,Bus_end_time,holidaylist=holidaylist,unit=unit)
print(df)
这导致:
ID Start End TimeAdj
0 10 2022-01-01 07:00:00 2022-01-08 15:00:00 5400.0
1 11 2022-01-02 18:00:00 2022-01-10 15:30:00 5400.0
2 12 2022-01-01 09:15:00 2022-01-08 12:00:00 5400.0
3 13 2022-01-07 13:00:00 2022-01-23 17:00:00 5400.0
出于某种原因,我显示了浮点值,但我可以稍后修复它。
接下来,我需要在数据框中每行计算 运行。
我尝试替换开始日期和结束日期中的 df 列,但出现错误:
startdate = df['Start']
enddate = df['End']
print(bd.businessDuration(startdate,enddate,Bus_start_time,Bus_end_time,holidaylist=holidaylist,unit=unit))`
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
然后我检查了业务持续时间的文档,并调整为以下内容:
from itertools import repeat
df['TimeAdj'] = list(map(bd.businessDuration,startdate,enddate,repeat(Bus_start_time),repeat(Bus_end_time),repeat(holidaylist),repeat(unit)))
AttributeError: 'str' object has no attribute 'date'
我希望在 TimeAdj 列的每一行中以正确的值结束 (添加了示例数字)。
ID Start End TimeAdj
0 10 2022-01-01 07:00:00 2022-01-08 15:00:00 2300
1 11 2022-01-02 18:00:00 2022-01-10 15:30:00 2830
2 12 2022-01-01 09:15:00 2022-01-08 12:00:00 2115
3 13 2022-01-07 13:00:00 2022-01-23 17:00:00 4800
我需要对此进行哪些调整?
使用:
from functools import partial
# Convert strings to datetime
df['Start'] = pd.to_datetime(df['Start'])
df['End'] = pd.to_datetime(df['End'])
# Get holidays list
years = range(df['Start'].min().year, df['End'].max().year+1)
holidaylist = pyholidays.ZA(years=years).keys()
# Create a partial function as a shortcut
bduration = partial(bd.businessDuration,
starttime=Bus_start_time, endtime=Bus_end_time,
holidaylist=holidaylist, unit=unit)
# Compute business duration
df['TimeAdj'] = df.apply(lambda x: bduration(x['Start'], x['End']), axis=1)
输出:
>>> df
ID Start End TimeAdj
0 10 2022-01-01 07:00:00 2022-01-08 15:00:00 2700.0
1 11 2022-01-02 18:00:00 2022-01-10 15:30:00 3150.0
2 12 2022-01-01 09:15:00 2022-01-08 12:00:00 2700.0
3 13 2022-01-07 13:00:00 2022-01-23 17:00:00 5640.0
我有以下问题,我觉得我离解决它只有几步之遥,但我还没有足够的经验。为此,我使用了 business-duration。 我查看了其他类似的答案并尝试了很多方法,但这是我得到的最接近的答案 (Using this answer)。我正在使用 Anaconda 和 Spyder,这是目前我在工作笔记本电脑上使用的唯一方法。我无法将一些自定义工作日功能安装到 anaconda 中。
我有一个大型数据集(约 20 万行),我需要解决此问题:
import pandas as pd
import business_duration as bd
import datetime as dt
import holidays as pyholidays
#Specify Business Working hours (8am - 5pm)
Bus_start_time = dt.time(8,00,0)
Bus_end_time = dt.time(17,0,0)
holidaylist = pyholidays.ZA()
unit='min'
list = [[10, '2022-01-01 07:00:00', '2022-01-08 15:00:00'], [11, '2022-01-02 18:00:00', '2022-01-10 15:30:00'],
[12, '2022-01-01 09:15:00', '2022-01-08 12:00:00'], [13, '2022-01-07 13:00:00', '2022-01-23 17:00:00']]
df = pd.DataFrame(list, columns =['ID', 'Start', 'End'])
print(df)
给出:
ID Start End
0 10 2022-01-01 07:00:00 2022-01-08 15:00:00
1 11 2022-01-02 18:00:00 2022-01-10 15:30:00
2 12 2022-01-01 09:15:00 2022-01-08 12:00:00
3 13 2022-01-07 13:00:00 2022-01-23 17:00:00
下一步用于测试单个日期:
startdate = pd.to_datetime('2022-01-01 00:00:00')
enddate = pd.to_datetime('2022-01-14 23:00:00')
df['TimeAdj'] = bd.businessDuration(startdate,enddate,Bus_start_time,Bus_end_time,holidaylist=holidaylist,unit=unit)
print(df)
这导致:
ID Start End TimeAdj
0 10 2022-01-01 07:00:00 2022-01-08 15:00:00 5400.0
1 11 2022-01-02 18:00:00 2022-01-10 15:30:00 5400.0
2 12 2022-01-01 09:15:00 2022-01-08 12:00:00 5400.0
3 13 2022-01-07 13:00:00 2022-01-23 17:00:00 5400.0
出于某种原因,我显示了浮点值,但我可以稍后修复它。 接下来,我需要在数据框中每行计算 运行。
我尝试替换开始日期和结束日期中的 df 列,但出现错误:
startdate = df['Start']
enddate = df['End']
print(bd.businessDuration(startdate,enddate,Bus_start_time,Bus_end_time,holidaylist=holidaylist,unit=unit))`
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
然后我检查了业务持续时间的文档,并调整为以下内容:
from itertools import repeat
df['TimeAdj'] = list(map(bd.businessDuration,startdate,enddate,repeat(Bus_start_time),repeat(Bus_end_time),repeat(holidaylist),repeat(unit)))
AttributeError: 'str' object has no attribute 'date'
我希望在 TimeAdj 列的每一行中以正确的值结束 (添加了示例数字)。
ID Start End TimeAdj
0 10 2022-01-01 07:00:00 2022-01-08 15:00:00 2300
1 11 2022-01-02 18:00:00 2022-01-10 15:30:00 2830
2 12 2022-01-01 09:15:00 2022-01-08 12:00:00 2115
3 13 2022-01-07 13:00:00 2022-01-23 17:00:00 4800
我需要对此进行哪些调整?
使用:
from functools import partial
# Convert strings to datetime
df['Start'] = pd.to_datetime(df['Start'])
df['End'] = pd.to_datetime(df['End'])
# Get holidays list
years = range(df['Start'].min().year, df['End'].max().year+1)
holidaylist = pyholidays.ZA(years=years).keys()
# Create a partial function as a shortcut
bduration = partial(bd.businessDuration,
starttime=Bus_start_time, endtime=Bus_end_time,
holidaylist=holidaylist, unit=unit)
# Compute business duration
df['TimeAdj'] = df.apply(lambda x: bduration(x['Start'], x['End']), axis=1)
输出:
>>> df
ID Start End TimeAdj
0 10 2022-01-01 07:00:00 2022-01-08 15:00:00 2700.0
1 11 2022-01-02 18:00:00 2022-01-10 15:30:00 3150.0
2 12 2022-01-01 09:15:00 2022-01-08 12:00:00 2700.0
3 13 2022-01-07 13:00:00 2022-01-23 17:00:00 5640.0