Python 按周(周一至周日)拆分下班时间(工作)

Split Time Off Hours (Working) by Week (Monday - Sunday) in Python

我有每位员工的休假数据。我需要按周(周一 - 周日)拆分它,但在此之前,我需要计算 WORKING 休息日和每个 WORKING 休息日的小时数, 所以如果休假从一周的中间开始(例如星期三),我们就会知道,那一周只会分配 3 个工作日(星期三、星期四和星期五)的小时数。

Id     Name       Date Start    Date End   Time Off Hours

1  Tom Holland   2022-04-22    2022-05-06        88.0    

我能够排除周末并计算工作日休息天数每个工作日休息时间

test = {'Id': [1], 'Name': ['Tom Holland'], 'Date Start': ['2022-04-22'], 'Date End': ['2022-05-06'], 'Time Off Hours': [88.0]}

df = pd.DataFrame(data=test)

time_diff = []

for i in df.index:

    time_diff.append(np.busday_count(df["Date Start"][i], df["Date End"][i], weekmask=[1,1,1,1,1,0,0]) + 1)

df["Days Off (Working)"] = time_diff

df['Hours per Days Off (Working)'] = df["Time Off Hours"] / df["Days Off (Working)"]

输出为:

Id     Name       Date Start    Date End   Time Off Hours   Days Off (Working)  Hours per Days Off (Working)

1  Tom Holland   2022-04-22    2022-05-06        88.0            11                    8.0

现在我需要拆分这条记录并将其分组为 3 条记录(在本例中),因为 2022-04-22 和 2022-05-06 日期范围在 3 周内(周一至周日):

  1. 从 2022-04-18 到 2022-04-24 周(1 个工作日休息 = 8 小时)
  2. 从 2022-04-25 到 2022-05-01 周(5 个工作日休息 = 40 小时)
  3. 从 2022-05-02 到 2022-05-08 周(5 个工作日休息 = 40 小时)

所需的输出应类似于:

Id Name Week Start Week End Days Off (Working) Hours per Days Off (Working) Total Off Hours
1 Tom Holland 2022-04-18 2022-04-24 1 8.0 8.0
1 Tom Holland 2022-04-25 2022-05-01 5 8.0 40.0
1 Tom Holland 2022-05-02 2022-05-08 5 8.0 40.0

这不是最简洁的方法,但可以完成工作。首先,我创建了您的示例 df

test = {'Id': [1], 'Name': ['Tom Holland'], 'Date Start': ['2022-04-22'], 'Date End': ['2022-05-06'], 'Time Off Hours': [88.0]}
df = pd.DataFrame(data=test)

然后我创建了一个辅助函数,它可以帮助我稍后计算每周的工作天数,同时使用 Date StartDate End 以防它们开始或结束之前周星和结束

# You could try to use np.select to optimize this part
def get_work_days(row: pd.Series) -> int:
  start = row["Date Start"]
  end = row["Date End"]
  week_start = row["Week Start"]
  week_end = row["Week End"]

  if week_start <= start <= week_end:
    bdays = len(pd.bdate_range(start, week_end))
  elif week_start <= end <= week_end:
    bdays = len(pd.bdate_range(week_start, end))
  elif week_start <= end <= week_end and week_start <= start <= week_end:
    bdays = len(pd.bdate_range(start, end))
  else:
    bdays = len(pd.bdate_range(week_start, week_end))
  
  return bdays

最后,过程部分 return 您想要的输出

def process_dataframe(df: pd.DataFrame) -> pd.DataFrame:
  # Making sure that these columns are datetime
  df["Date Start"] = pd.to_datetime(df["Date Start"])
  df["Date End"] = pd.to_datetime(df["Date End"])
  # Calculating Working days between date start and date end
  df["Days Off (Working)"] = df.apply(lambda row: len(pd.bdate_range(row["Date Start"], row["Date End"])), axis=1)
  df['Hours per Days Off (Working)'] = df["Time Off Hours"] / df["Days Off (Working)"]
  # Creating Week Start values 
  df["Week Start"] = df.apply(
      lambda row: pd.date_range(
          start=row["Date Start"].to_period("W").start_time, 
          end=row["Date End"].to_period("W").start_time, 
          freq="7D"
    ),
    axis=1
  )
  # Creating Week End values
  df["Week End"] = df.apply(
      lambda row: pd.date_range(
          start=row["Date Start"].to_period("W").end_time, 
          end=row["Date End"].to_period("W").end_time, 
          freq="7D"
    ),
    axis=1
  )
  # Exploding the values, since the way they were created made them as a DatetimeIndex
  # field.
  df = df.explode(["Week Start", "Week End"])
  # Just did that because the Week End had a weird time due to .end_time 
  df["Week End"] = pd.to_datetime(df["Week End"].dt.date)
  df["Week Start"] = pd.to_datetime(df["Week Start"].dt.date)
  # Using the helper function to calculate the working days
  df["Days Off (Working)"] = df.apply(get_work_days, axis=1)
  df["Total Off Hours"] = df["Days Off (Working)"] * df["Hours per Days Off (Working)"]
  return df[["Name", "Week Start", "Week End", "Days Off (Working)", "Hours per Days Off (Working)", "Total Off Hours"]]

编辑

get_work_days 函数中的快速修复。我们需要先检查开始日期和结束日期是否在一周内,然后检查各个日期,以便更新版本看起来像

def get_work_days(row: pd.Series) -> int:
  start = row["Date Start"]
  end = row["Date End"]
  week_start = row["Week Start"]
  week_end = row["Week End"]
 
  if week_start <= end <= week_end and week_start <= start <= week_end:
    bdays = len(pd.bdate_range(start, end))
  elif week_start <= start <= week_end:
    bdays = len(pd.bdate_range(start, week_end))
  elif week_start <= end <= week_end:
    bdays = len(pd.bdate_range(week_start, end))
  else:
    bdays = len(pd.bdate_range(week_start, week_end))
  
  return bdays