Python 按周(周一至周日)拆分下班时间(工作)
Split Time Off Hours (Working) by Week (Monday - Sunday) in Python
我有每位员工的休假数据。我需要按周(周一 - 周日)拆分它,但在此之前,我需要计算 WORKING 休息日和每个 WORKING 休息日的小时数, 所以如果休假从一周的中间开始(例如星期三),我们就会知道,那一周只会分配 3 个工作日(星期三、星期四和星期五)的小时数。
Id Name Date Start Date End Time Off Hours
1 Tom Holland 2022-04-22 2022-05-06 88.0
我能够排除周末并计算工作日休息天数和每个工作日休息时间。
test = {'Id': [1], 'Name': ['Tom Holland'], 'Date Start': ['2022-04-22'], 'Date End': ['2022-05-06'], 'Time Off Hours': [88.0]}
df = pd.DataFrame(data=test)
time_diff = []
for i in df.index:
time_diff.append(np.busday_count(df["Date Start"][i], df["Date End"][i], weekmask=[1,1,1,1,1,0,0]) + 1)
df["Days Off (Working)"] = time_diff
df['Hours per Days Off (Working)'] = df["Time Off Hours"] / df["Days Off (Working)"]
输出为:
Id Name Date Start Date End Time Off Hours Days Off (Working) Hours per Days Off (Working)
1 Tom Holland 2022-04-22 2022-05-06 88.0 11 8.0
现在我需要拆分这条记录并将其分组为 3 条记录(在本例中),因为 2022-04-22 和 2022-05-06 日期范围在 3 周内(周一至周日):
- 从 2022-04-18 到 2022-04-24 周(1 个工作日休息 = 8 小时)
- 从 2022-04-25 到 2022-05-01 周(5 个工作日休息 = 40 小时)
- 从 2022-05-02 到 2022-05-08 周(5 个工作日休息 = 40 小时)
所需的输出应类似于:
Id
Name
Week Start
Week End
Days Off (Working)
Hours per Days Off (Working)
Total Off Hours
1
Tom Holland
2022-04-18
2022-04-24
1
8.0
8.0
1
Tom Holland
2022-04-25
2022-05-01
5
8.0
40.0
1
Tom Holland
2022-05-02
2022-05-08
5
8.0
40.0
这不是最简洁的方法,但可以完成工作。首先,我创建了您的示例 df
test = {'Id': [1], 'Name': ['Tom Holland'], 'Date Start': ['2022-04-22'], 'Date End': ['2022-05-06'], 'Time Off Hours': [88.0]}
df = pd.DataFrame(data=test)
然后我创建了一个辅助函数,它可以帮助我稍后计算每周的工作天数,同时使用 Date Start
和 Date End
以防它们开始或结束之前周星和结束
# You could try to use np.select to optimize this part
def get_work_days(row: pd.Series) -> int:
start = row["Date Start"]
end = row["Date End"]
week_start = row["Week Start"]
week_end = row["Week End"]
if week_start <= start <= week_end:
bdays = len(pd.bdate_range(start, week_end))
elif week_start <= end <= week_end:
bdays = len(pd.bdate_range(week_start, end))
elif week_start <= end <= week_end and week_start <= start <= week_end:
bdays = len(pd.bdate_range(start, end))
else:
bdays = len(pd.bdate_range(week_start, week_end))
return bdays
最后,过程部分 return 您想要的输出
def process_dataframe(df: pd.DataFrame) -> pd.DataFrame:
# Making sure that these columns are datetime
df["Date Start"] = pd.to_datetime(df["Date Start"])
df["Date End"] = pd.to_datetime(df["Date End"])
# Calculating Working days between date start and date end
df["Days Off (Working)"] = df.apply(lambda row: len(pd.bdate_range(row["Date Start"], row["Date End"])), axis=1)
df['Hours per Days Off (Working)'] = df["Time Off Hours"] / df["Days Off (Working)"]
# Creating Week Start values
df["Week Start"] = df.apply(
lambda row: pd.date_range(
start=row["Date Start"].to_period("W").start_time,
end=row["Date End"].to_period("W").start_time,
freq="7D"
),
axis=1
)
# Creating Week End values
df["Week End"] = df.apply(
lambda row: pd.date_range(
start=row["Date Start"].to_period("W").end_time,
end=row["Date End"].to_period("W").end_time,
freq="7D"
),
axis=1
)
# Exploding the values, since the way they were created made them as a DatetimeIndex
# field.
df = df.explode(["Week Start", "Week End"])
# Just did that because the Week End had a weird time due to .end_time
df["Week End"] = pd.to_datetime(df["Week End"].dt.date)
df["Week Start"] = pd.to_datetime(df["Week Start"].dt.date)
# Using the helper function to calculate the working days
df["Days Off (Working)"] = df.apply(get_work_days, axis=1)
df["Total Off Hours"] = df["Days Off (Working)"] * df["Hours per Days Off (Working)"]
return df[["Name", "Week Start", "Week End", "Days Off (Working)", "Hours per Days Off (Working)", "Total Off Hours"]]
编辑
get_work_days
函数中的快速修复。我们需要先检查开始日期和结束日期是否在一周内,然后检查各个日期,以便更新版本看起来像
def get_work_days(row: pd.Series) -> int:
start = row["Date Start"]
end = row["Date End"]
week_start = row["Week Start"]
week_end = row["Week End"]
if week_start <= end <= week_end and week_start <= start <= week_end:
bdays = len(pd.bdate_range(start, end))
elif week_start <= start <= week_end:
bdays = len(pd.bdate_range(start, week_end))
elif week_start <= end <= week_end:
bdays = len(pd.bdate_range(week_start, end))
else:
bdays = len(pd.bdate_range(week_start, week_end))
return bdays
我有每位员工的休假数据。我需要按周(周一 - 周日)拆分它,但在此之前,我需要计算 WORKING 休息日和每个 WORKING 休息日的小时数, 所以如果休假从一周的中间开始(例如星期三),我们就会知道,那一周只会分配 3 个工作日(星期三、星期四和星期五)的小时数。
Id Name Date Start Date End Time Off Hours
1 Tom Holland 2022-04-22 2022-05-06 88.0
我能够排除周末并计算工作日休息天数和每个工作日休息时间。
test = {'Id': [1], 'Name': ['Tom Holland'], 'Date Start': ['2022-04-22'], 'Date End': ['2022-05-06'], 'Time Off Hours': [88.0]}
df = pd.DataFrame(data=test)
time_diff = []
for i in df.index:
time_diff.append(np.busday_count(df["Date Start"][i], df["Date End"][i], weekmask=[1,1,1,1,1,0,0]) + 1)
df["Days Off (Working)"] = time_diff
df['Hours per Days Off (Working)'] = df["Time Off Hours"] / df["Days Off (Working)"]
输出为:
Id Name Date Start Date End Time Off Hours Days Off (Working) Hours per Days Off (Working)
1 Tom Holland 2022-04-22 2022-05-06 88.0 11 8.0
现在我需要拆分这条记录并将其分组为 3 条记录(在本例中),因为 2022-04-22 和 2022-05-06 日期范围在 3 周内(周一至周日):
- 从 2022-04-18 到 2022-04-24 周(1 个工作日休息 = 8 小时)
- 从 2022-04-25 到 2022-05-01 周(5 个工作日休息 = 40 小时)
- 从 2022-05-02 到 2022-05-08 周(5 个工作日休息 = 40 小时)
所需的输出应类似于:
Id | Name | Week Start | Week End | Days Off (Working) | Hours per Days Off (Working) | Total Off Hours |
---|---|---|---|---|---|---|
1 | Tom Holland | 2022-04-18 | 2022-04-24 | 1 | 8.0 | 8.0 |
1 | Tom Holland | 2022-04-25 | 2022-05-01 | 5 | 8.0 | 40.0 |
1 | Tom Holland | 2022-05-02 | 2022-05-08 | 5 | 8.0 | 40.0 |
这不是最简洁的方法,但可以完成工作。首先,我创建了您的示例 df
test = {'Id': [1], 'Name': ['Tom Holland'], 'Date Start': ['2022-04-22'], 'Date End': ['2022-05-06'], 'Time Off Hours': [88.0]}
df = pd.DataFrame(data=test)
然后我创建了一个辅助函数,它可以帮助我稍后计算每周的工作天数,同时使用 Date Start
和 Date End
以防它们开始或结束之前周星和结束
# You could try to use np.select to optimize this part
def get_work_days(row: pd.Series) -> int:
start = row["Date Start"]
end = row["Date End"]
week_start = row["Week Start"]
week_end = row["Week End"]
if week_start <= start <= week_end:
bdays = len(pd.bdate_range(start, week_end))
elif week_start <= end <= week_end:
bdays = len(pd.bdate_range(week_start, end))
elif week_start <= end <= week_end and week_start <= start <= week_end:
bdays = len(pd.bdate_range(start, end))
else:
bdays = len(pd.bdate_range(week_start, week_end))
return bdays
最后,过程部分 return 您想要的输出
def process_dataframe(df: pd.DataFrame) -> pd.DataFrame:
# Making sure that these columns are datetime
df["Date Start"] = pd.to_datetime(df["Date Start"])
df["Date End"] = pd.to_datetime(df["Date End"])
# Calculating Working days between date start and date end
df["Days Off (Working)"] = df.apply(lambda row: len(pd.bdate_range(row["Date Start"], row["Date End"])), axis=1)
df['Hours per Days Off (Working)'] = df["Time Off Hours"] / df["Days Off (Working)"]
# Creating Week Start values
df["Week Start"] = df.apply(
lambda row: pd.date_range(
start=row["Date Start"].to_period("W").start_time,
end=row["Date End"].to_period("W").start_time,
freq="7D"
),
axis=1
)
# Creating Week End values
df["Week End"] = df.apply(
lambda row: pd.date_range(
start=row["Date Start"].to_period("W").end_time,
end=row["Date End"].to_period("W").end_time,
freq="7D"
),
axis=1
)
# Exploding the values, since the way they were created made them as a DatetimeIndex
# field.
df = df.explode(["Week Start", "Week End"])
# Just did that because the Week End had a weird time due to .end_time
df["Week End"] = pd.to_datetime(df["Week End"].dt.date)
df["Week Start"] = pd.to_datetime(df["Week Start"].dt.date)
# Using the helper function to calculate the working days
df["Days Off (Working)"] = df.apply(get_work_days, axis=1)
df["Total Off Hours"] = df["Days Off (Working)"] * df["Hours per Days Off (Working)"]
return df[["Name", "Week Start", "Week End", "Days Off (Working)", "Hours per Days Off (Working)", "Total Off Hours"]]
编辑
get_work_days
函数中的快速修复。我们需要先检查开始日期和结束日期是否在一周内,然后检查各个日期,以便更新版本看起来像
def get_work_days(row: pd.Series) -> int:
start = row["Date Start"]
end = row["Date End"]
week_start = row["Week Start"]
week_end = row["Week End"]
if week_start <= end <= week_end and week_start <= start <= week_end:
bdays = len(pd.bdate_range(start, end))
elif week_start <= start <= week_end:
bdays = len(pd.bdate_range(start, week_end))
elif week_start <= end <= week_end:
bdays = len(pd.bdate_range(week_start, end))
else:
bdays = len(pd.bdate_range(week_start, week_end))
return bdays