计算拆分工时?

Calculating split labor hours?

我需要为一些报告计算拆分工时,并且一直在努力解决这个问题。我们的 Oracle 数据库目前使用包中的存储过程来执行此操作,但我在这个项目的 Snowflake/dbt 范围内工作,虽然我可以将数据从 Oracle 中的视图带到 Snowflake,但我想在重新计算分段时间之前对时间进行一些转换,因为我们经常有人忘记打卡,我想在轮班结束时更正这些。所以我必须在调整错误的时钟输出后重新计算分段时间。

我的数据是这样的:

emp_id task_id time_in time_out
43 159221 2021-09-30 09:00:00 2021-09-30 14:30:00
43 188239 2021-09-30 09:16:03 2021-09-30 10:44:51
43 177741 2021-09-30 13:12:49 2021-09-30 14:14:32

停在这里并总结时间得到:

emp_id task_id hrs
43 159221 5.5
43 188239 1.47
43 177741 1.03

这是错误的,因为花费的一些时间与其他任务共享,应该分开。

我需要做的是,如果我们要计算第一个任务的总小时数,则公式为:

.267 + (1.47 / 2) + 2.47 + (1.03 / 2) + .27 = 4.257 小时 task_id 159221.

在 SQL 中有什么方法可以处理这个问题,还是我必须学习 JavaScript 并编写存储过程?

其他详细信息:

您可以尝试以下方法。

在聚合结果以从实际任务持续时间中减去总重叠持续时间之前,使用了 LEFT JOIN 来确定重叠持续时间。

SELECT
     emp_id,
     task_id,
     MAX(task_duration) - SUM(overlap_duration/2) as hours
FROM (
    SELECT
        t1.emp_id,
        t1.task_id,
        t1.time_in,
        t1.time_out,
        (datediff(seconds, t1.time_in, t1.time_out)/3600) as task_duration,
         CASE
             WHEN t2.task_id IS NULL THEN 0
             ELSE (datediff(seconds,t2.time_in,t2.time_out)/3600)
         END as overlap_duration,
         t2.task_id as overlap_task
    FROM
        task_times t1
    LEFT JOIN
        task_times t2 ON t1.emp_id = t2.emp_id AND
                         t1.task_id < t2.task_id AND
                         (
                      (t1.time_in between t2.time_in and t2.time_out) or
  (t1.time_out between t2.time_in and t2.time_out) or
  (t1.time_in < t2.time_in and t1.time_out > t2.time_out) or
  (t1.time_in > t2.time_in and t1.time_out < t2.time_out)
                      )
                      
) t3
GROUP BY
    emp_id, task_id;
emp_id task_id hours
43 159221 4.245694444444444
43 177741 1.0286111111111111
43 188239 1.48

View working demo on DB Fiddle

我还在下面包含了子查询的输出供您仔细阅读

SELECT
        t1.emp_id,
        t1.task_id,
        t1.time_in,
        t1.time_out,
        (datediff(seconds, t1.time_in, t1.time_out)/3600) as task_duration,
         CASE
             WHEN t2.task_id IS NULL THEN 0
             ELSE (datediff(seconds,t2.time_in,t2.time_out)/3600)
         END as overlap_duration,
         t2.task_id as overlap_task
    FROM
        task_times t1
    LEFT JOIN
        task_times t2 ON t1.emp_id = t2.emp_id AND
                         t1.task_id < t2.task_id AND
                         (
                      (t1.time_in between t2.time_in and t2.time_out) or
  (t1.time_out between t2.time_in and t2.time_out) or
  (t1.time_in < t2.time_in and t1.time_out > t2.time_out) or
  (t1.time_in > t2.time_in and t1.time_out < t2.time_out)
                      );
emp_id task_id time_in time_out task_duration overlap_duration overlap_task
43 159221 2021-09-30T09:00:00.000Z 2021-09-30T14:30:00.000Z 5.5 1.48 188239
43 159221 2021-09-30T09:00:00.000Z 2021-09-30T14:30:00.000Z 5.5 1.0286111111111111 177741
43 188239 2021-09-30T09:16:03.000Z 2021-09-30T10:44:51.000Z 1.48 0
43 177741 2021-09-30T13:12:49.000Z 2021-09-30T14:14:32.000Z 1.0286111111111111 0

View demo on DB Fiddle

编辑 1

正在回复问题更新

Additional details:

  1. The # of tasks that can overlap is variable, it's not only up to 2.
  2. Employees may clock into the same task multiple times throughout the day

我添加了更多示例数据并提供了更新的 fiddle。我无法提供带有雪花的演示 fiddle,但我们可以将 EXTRACT 更改为 DATEDIFF,如前所示。

CREATE TABLE task_times (
  emp_id INTEGER,
  task_id INTEGER,
  time_in TIMESTAMP,
  time_out TIMESTAMP
);

INSERT INTO task_times
  (emp_id, task_id, time_in, time_out)
VALUES
-- add repetition of the same tasks with overlaps
  ('43', '159221', '2021-09-30 09:00:00', '2021-09-30 14:30:00'),
  ('43', '159221', '2021-09-30 09:03:00', '2021-09-30 14:27:00'),
  ('43', '159221', '2021-09-30 09:03:00', '2021-09-30 14:38:00'),
--- add overlap of different tasks
  ('43', '188239', '2021-09-30 09:16:03', '2021-09-30 10:44:51'),
  ('43', '188239', '2021-09-30 09:16:03', '2021-09-30 10:43:51'),
  ('43', '177741', '2021-09-30 13:12:49', '2021-09-30 14:14:32'),
-- add multiple overalapping tasks with a difference of 10 minutes
  ('43', '333333', '2021-10-30 09:00:00', '2021-10-30 12:40:00'),
  ('43', '333334', '2021-10-30 09:00:00', '2021-10-30 12:30:00'),
  ('43', '333335', '2021-10-30 09:00:00', '2021-10-30 12:20:00'),
  ('43', '333336', '2021-10-30 09:00:00', '2021-10-30 12:10:00'),
-- add multiple log in and log out times with no overlaps
  ('43', '333337', '2021-10-30 09:00:00', '2021-10-30 12:00:00'),
  ('43', '333337', '2021-10-30 12:30:00', '2021-10-30 13:00:00'),
  ('43', '333337', '2021-10-30 13:30:00', '2021-10-30 14:00:00')
  ;

确定重叠的子查询 (用于调试目的)

SELECT
        t1.emp_id,
        t1.task_id,
        t1.time_in,
        t1.time_out,
        (EXTRACT(epoch FROM t1.time_out-t1.time_in)/3600) as task_duration,
         CASE
             WHEN t2.task_id IS NULL THEN 0
             ELSE (EXTRACT(epoch FROM t2.time_out-t2.time_in)/3600)
         END as overlap_duration,
         t2.task_id as overlap_task
    FROM
        task_times t1
    LEFT JOIN
        task_times t2 ON t1.emp_id = t2.emp_id AND
                         t1.task_id < t2.task_id AND
                         (
                      (t1.time_in between t2.time_in and t2.time_out) or
  (t1.time_out between t2.time_in and t2.time_out) or
  (t1.time_in < t2.time_in and t1.time_out > t2.time_out) or
  (t1.time_in > t2.time_in and t1.time_out < t2.time_out)
                      );
emp_id task_id time_in time_out task_duration overlap_duration overlap_task
43 159221 2021-09-30T09:00:00.000Z 2021-09-30T14:30:00.000Z 5.5 1.48 188239
43 159221 2021-09-30T09:00:00.000Z 2021-09-30T14:30:00.000Z 5.5 1.4633333333333334 188239
43 159221 2021-09-30T09:00:00.000Z 2021-09-30T14:30:00.000Z 5.5 1.0286111111111111 177741
43 159221 2021-09-30T09:03:00.000Z 2021-09-30T14:27:00.000Z 5.4 1.48 188239
43 159221 2021-09-30T09:03:00.000Z 2021-09-30T14:27:00.000Z 5.4 1.4633333333333334 188239
43 159221 2021-09-30T09:03:00.000Z 2021-09-30T14:27:00.000Z 5.4 1.0286111111111111 177741
43 159221 2021-09-30T09:03:00.000Z 2021-09-30T14:38:00.000Z 5.583333333333333 1.48 188239
43 159221 2021-09-30T09:03:00.000Z 2021-09-30T14:38:00.000Z 5.583333333333333 1.4633333333333334 188239
43 159221 2021-09-30T09:03:00.000Z 2021-09-30T14:38:00.000Z 5.583333333333333 1.0286111111111111 177741
43 188239 2021-09-30T09:16:03.000Z 2021-09-30T10:44:51.000Z 1.48 0
43 188239 2021-09-30T09:16:03.000Z 2021-09-30T10:43:51.000Z 1.4633333333333334 0
43 177741 2021-09-30T13:12:49.000Z 2021-09-30T14:14:32.000Z 1.0286111111111111 0
43 333333 2021-10-30T09:00:00.000Z 2021-10-30T12:40:00.000Z 3.6666666666666665 3.5 333334
43 333333 2021-10-30T09:00:00.000Z 2021-10-30T12:40:00.000Z 3.6666666666666665 3.3333333333333335 333335
43 333333 2021-10-30T09:00:00.000Z 2021-10-30T12:40:00.000Z 3.6666666666666665 3.1666666666666665 333336
43 333333 2021-10-30T09:00:00.000Z 2021-10-30T12:40:00.000Z 3.6666666666666665 3 333337
43 333333 2021-10-30T09:00:00.000Z 2021-10-30T12:40:00.000Z 3.6666666666666665 0.5 333337
43 333334 2021-10-30T09:00:00.000Z 2021-10-30T12:30:00.000Z 3.5 3.3333333333333335 333335
43 333334 2021-10-30T09:00:00.000Z 2021-10-30T12:30:00.000Z 3.5 3.1666666666666665 333336
43 333334 2021-10-30T09:00:00.000Z 2021-10-30T12:30:00.000Z 3.5 3 333337
43 333334 2021-10-30T09:00:00.000Z 2021-10-30T12:30:00.000Z 3.5 0.5 333337
43 333335 2021-10-30T09:00:00.000Z 2021-10-30T12:20:00.000Z 3.3333333333333335 3.1666666666666665 333336
43 333335 2021-10-30T09:00:00.000Z 2021-10-30T12:20:00.000Z 3.3333333333333335 3 333337
43 333336 2021-10-30T09:00:00.000Z 2021-10-30T12:10:00.000Z 3.1666666666666665 3 333337
43 333337 2021-10-30T09:00:00.000Z 2021-10-30T12:00:00.000Z 3 0
43 333337 2021-10-30T12:30:00.000Z 2021-10-30T13:00:00.000Z 0.5 0
43 333337 2021-10-30T13:30:00.000Z 2021-10-30T14:00:00.000Z 0.5 0

实际查询 (我只包含 mins 列用于调试目的)

SELECT
     emp_id,
     task_id,
     MAX(CASE
         WHEN overlap_duration>0 THEN task_duration
         ELSE 0
     END ) + SUM(CASE
         WHEN overlap_duration=0 THEN task_duration
         ELSE 0
     END) - AVG(overlap_duration) as hrs,
     (MAX(CASE
         WHEN overlap_duration>0 THEN task_duration
         ELSE 0
     END ) + SUM(CASE
         WHEN overlap_duration=0 THEN task_duration
         ELSE 0
     END) - AVG(overlap_duration))*60 as mins
FROM (
    SELECT
        t1.emp_id,
        t1.task_id,
        t1.time_in,
        t1.time_out,
        (EXTRACT(epoch FROM t1.time_out-t1.time_in)/3600) as task_duration,
         CASE
             WHEN t2.task_id IS NULL THEN 0
             ELSE (EXTRACT(epoch FROM t2.time_out-t2.time_in)/3600)
         END as overlap_duration,
         t2.task_id as overlap_task
    FROM
        task_times t1
    LEFT JOIN
        task_times t2 ON t1.emp_id = t2.emp_id AND
                         t1.task_id < t2.task_id AND
                         (
                      (t1.time_in between t2.time_in and t2.time_out) or
  (t1.time_out between t2.time_in and t2.time_out) or
  (t1.time_in < t2.time_in and t1.time_out > t2.time_out) or
  (t1.time_in > t2.time_in and t1.time_out < t2.time_out)
                      )
                      
) t3
GROUP BY
    emp_id, task_id
ORDER BY
    emp_id, task_id;
emp_id task_id hrs mins
43 159221 4.259351851851852 255.5611111111111
43 177741 1.0286111111111111 61.71666666666667
43 188239 2.9433333333333334 176.6
43 333333 0.9666666666666663 57.99999999999998
43 333334 1 60
43 333335 0.25000000000000044 15.000000000000027
43 333336 0.16666666666666652 9.999999999999991
43 333337 4 240

View on DB Fiddle

让我知道这是否适合你。