SQL 个不同的项目工作日期,不包括休息日期

SQL distinct Worked Dates across Projects excluding Break Dates

考虑以下架构;

CREATE TABLE `Project Assignment`
    (`Employee` varchar(1), `Project Id` int, `Project Assignment Date` date, `Project Relieving Date` date)
;

INSERT INTO `Project Assignment`
    (`Employee`, `Project Id`, `Project Assignment Date`, `Project Relieving Date`)
VALUES
    ('A', 1, '2018-04-01', '2019-12-25'),
    ('A', 2, '2019-06-15', '2020-03-31'),
    ('A', 3, '2019-09-07', '2020-05-20'),
    ('A', 4, '2020-07-14', '2020-12-15')
;


CREATE TABLE `Break`
    (`Break Id` int, `Employee` varchar(1), `Project Id` int, `Break Start Date` date, `Break End Date` date)
;

INSERT INTO `Break`
    (`Break Id`, `Employee`, `Project Id`, `Break Start Date`, `Break End Date`)
VALUES
    (1, 'A', 1, '2018-09-01', '2018-09-30'),
    (2, 'A', 1, '2019-10-05', '2019-11-30'),
    (3, 'A', 2, '2019-10-15', '2019-11-15'),
    (4, 'A', 3, '2019-11-01', '2019-11-10'),
    (5, 'A', 2, '2020-01-01', '2020-01-10'),
    (6, 'A', 3, '2020-01-01', '2020-01-10')
;

在项目期间,员工可以在每个项目中休息一次或多次。休息时间在项目中不重叠,但可以跨项目重叠。

我们想要计算一名员工至少分配了一个项目的天数(减去)员工在所有分配的项目上休息的天数。

我能够使用以下查询得出员工分配给项目的不同天数:

SELECT merged.employee,
    SUM(DATEDIFF(merged.EndDate,merged.`Project Assignment Date`)+1) assigned_days
FROM (SELECT
        s1.employee, s1.`Project Assignment Date`,
        MIN(IFNULL(t1.`Project Relieving Date`,CURDATE())) AS EndDate
    FROM `Project Assignment` s1
    INNER JOIN `Project Assignment` t1
        ON t1.employee = s1.employee
        AND s1.`Project Assignment Date` <= IFNULL(t1.`Project Relieving Date`,CURDATE())
        AND NOT EXISTS( SELECT * FROM `Project Assignment` t2
            WHERE t2.employee = s1.employee 
                AND IFNULL(t1.`Project Relieving Date`,CURDATE()) >= t2.`Project Assignment Date` 
                AND IFNULL(t1.`Project Relieving Date`,CURDATE()) < IFNULL(t2.`Project Relieving Date`,CURDATE()))
    WHERE NOT EXISTS( SELECT * FROM `Project Assignment` s2
        WHERE s2.employee = s1.employee
            AND s1.`Project Assignment Date` > s2.`Project Assignment Date` 
            AND s1.`Project Assignment Date` <= IFNULL(s2.`Project Relieving Date`,CURDATE()))
    GROUP BY s1.employee, s1.`Project Assignment Date`
    ORDER BY s1.`Project Assignment Date`) merged
GROUP BY merged.employee

结果:

| employee | assigned_days |
| -------- | ------------- |
| A        | 936           |

但想不出一种方法来得出此人在所有分配的项目中休息的天数。

预期结果:

+----------+---------------+------------+-------------+
| employee | assigned_days | break_days | worked_days |
+==========+===============+============+=============+
| A        | 936           | 50         | 886         |
+----------+---------------+------------+-------------+

Mariadb 10.3.29

锻炼说明break_days

+----------+---------+-------------+------------------+-----------------+-------------------------------------------------------------------------------------------------------------------+
| Employee | Project | Break Start | Break End        | Days Considered | Remarks                                                                                                           |
+==========+=========+=============+==================+=================+===================================================================================================================+
| A        | 1       |  2018-09-01 |  2018-09-30      | 30              | Only one project assigned so consider whole break                                                                 |
+----------+---------+-------------+------------------+-----------------+-------------------------------------------------------------------------------------------------------------------+
| A        | 1       |  2019-10-05 |  2019-11-30      | 10              | 3 Projects were   assigned during these breaks. The common days of break fall between   2019-11-01 and 2019-11-10 |
+----------+---------+-------------+------------------+                 |                                                                                                                   |
| A        | 2       |  2019-10-15 |  2019-11-15      |                 |                                                                                                                   |
+----------+---------+-------------+------------------+                 |                                                                                                                   |
| A        | 3       |  2019-11-01 |  2019-11-10      |                 |                                                                                                                   |
+----------+---------+-------------+------------------+-----------------+-------------------------------------------------------------------------------------------------------------------+
| A        | 2       |  2020-01-01 |  2020-01-10      | 10              | 2 Projects were assigned during this time and break in both projects                                              |
+----------+---------+-------------+------------------+                 |                                                                                                                   |
| A        | 3       |  2020-01-01 |  2020-01-10      |                 |                                                                                                                   |
+----------+---------+-------------+------------------+-----------------+-------------------------------------------------------------------------------------------------------------------+
|          |         |             | Total Break Days | 50              |                                                                                                                   |
+----------+---------+-------------+------------------+-----------------+-------------------------------------------------------------------------------------------------------------------+

Link 对于 DB-Fiddle:https://www.db-fiddle.com/f/c8fMneAUkhb2P3rzjMtVZm/0

使用递归 CTE 获取每个员工的所有工作和所有休息日期。
然后,对于这两种情况下的每个日期,使用聚合将所有项目作为逗号分隔列表 GROUP_CONCAT().
如果这些列表与特定日期匹配,那么这是一个休息日期。

WITH RECURSIVE 
  working_dates AS (
    SELECT `Employee`, `Project Id`, `Project Assignment Date` AS date, `Project Relieving Date`
    FROM `Project Assignment`
    UNION ALL
    SELECT `Employee`, `Project Id`, date + INTERVAL 1 day, `Project Relieving Date`
    FROM working_dates
    WHERE date < `Project Relieving Date`
  ),
  break_dates AS (
    SELECT `Employee`, `Project Id`, `Break Start Date` AS date, `Break End Date`
    FROM `Break`
    UNION ALL
    SELECT `Employee`, `Project Id`, date + INTERVAL 1 day, `Break End Date`
    FROM break_dates
    WHERE date < `Break End Date`
  ),
  working AS (
    SELECT `Employee`, date,
           GROUP_CONCAT(`Project Id` ORDER BY `Project Id`) projects
    FROM working_dates
    GROUP BY `Employee`, date 
  ),
  breaks AS (
    SELECT `Employee`, date,
           GROUP_CONCAT(`Project Id` ORDER BY `Project Id`) projects
    FROM break_dates
    GROUP BY `Employee`, date
  )
SELECT w.`Employee`,
       COUNT(*) assigned_days, 
       COUNT(b.date) AS break_days,
       COUNT(*) - COUNT(b.date) worked_days
FROM working w LEFT JOIN breaks b
ON w.`Employee` = b.`Employee` AND w.date = b.date AND w.projects = b.projects
GROUP BY w.`Employee`

参见demo

Break Id 列添加到 Break table 后,我可以根据 @forpass 建议的聚合技术来推导休息日:

Then, for each date in both cases, with aggregation get all the projects as a comma separated list with GROUP_CONCAT().

对于每个中断,获取重叠项目的计数和列表(使用 GROUP_CONCAT)。 然后在 Break 上再次加入它以查找重叠中断的计数和列表以及最小的公共重叠(最晚开始和最早结束)。使用 ROW_NUMBER 消除重复项。

将 Assigned Days 的查询移动到另一个 CTE 并与 CTE 合并以获取休息时间 想要的结果。

WITH breaks_summary AS (
    SELECT `Employee`, SUM(break_days) break_days
    FROM (      
        SELECT b.`Employee`, DATEDIFF(b.end_date, b.start_date)+1 break_days, ROW_NUMBER() OVER (PARTITION BY b.break_ids) rn, overlapping_breaks, break_ids, projects_count
        FROM (
            SELECT b_p_cnt.`Employee`, b_p_cnt.`Project Id`, b_p_cnt.projects_count, 
            COUNT(b2.`Break Id`) overlapping_breaks, GROUP_CONCAT(b2.`Break Id`) break_ids, MAX(b2.start_date) start_date, MIN(b2.end_date) end_date
            FROM (
                SELECT b1.`Break Id`, b1.`Employee`, b1.`Project Id`, b1.start_date, b1.end_date, GROUP_CONCAT(pa.`Project Id`) projects, count(pa.`Project Id`) projects_count
                FROM (
                    SELECT `Break Id`, `Employee`, `Project Id`, `Break Start Date` AS start_date, `Break End Date` AS end_date
                    FROM `Break` 
                    ) b1
                LEFT JOIN `Project Assignment` pa ON b1.`Employee` = pa.`Employee`
                    AND ((b1.start_date BETWEEN pa.`Project Assignment Date` AND IFNULL(pa.`Project Relieving Date`,CURDATE()))
                        OR (b1.end_date BETWEEN pa.`Project Assignment Date` AND IFNULL(pa.`Project Relieving Date`,CURDATE())))
                GROUP BY b1.`Break Id`, b1.`Employee`, b1.`Project Id`, b1.start_date, b1.end_date) b_p_cnt
            LEFT JOIN (
                SELECT `Break Id`, `Employee`, `Project Id`, `Break Start Date` AS start_date, `Break End Date` AS end_date
                FROM `Break`
                ORDER BY `Break Id`) b2 ON b_p_cnt.`Employee` = b2.`Employee` 
                    AND ((b_p_cnt.start_date BETWEEN b2.start_date AND b2.end_date)
                        OR (b_p_cnt.end_date BETWEEN b2.start_date AND b2.end_date))
            GROUP BY b_p_cnt.`Break Id`, b_p_cnt.`Employee`, b_p_cnt.`Project Id`, 
                b_p_cnt.start_date, b_p_cnt.end_date, b_p_cnt.projects, b_p_cnt.projects_count
            HAVING count(b2.`Break Id`) = b_p_cnt.projects_count
            ORDER BY b_p_cnt.`Employee`, `Project Id`) b        
            ) breaks
    WHERE rn = 1
    GROUP BY `Employee`),   
assigned AS (
    SELECT merged.`Employee`, SUM(DATEDIFF(merged.EndDate,merged.`Project Assignment Date`)+1) assigned_days
            FROM (SELECT s1.`Employee`, s1.`Project Assignment Date`,
                    MIN(IFNULL(t1.`Project Relieving Date`,CURDATE())) AS EndDate
                FROM `Project Assignment` s1
                INNER JOIN `Project Assignment` t1 ON t1.`Employee` = s1.`Employee`
                    AND s1.`Project Assignment Date` <= IFNULL(t1.`Project Relieving Date`,CURDATE())
                    AND NOT EXISTS( SELECT * FROM `Project Assignment` t2
                        WHERE t2.`Employee` = s1.`Employee`
                            AND IFNULL(t1.`Project Relieving Date`,CURDATE()) >= t2.`Project Assignment Date` 
                            AND IFNULL(t1.`Project Relieving Date`,CURDATE()) < IFNULL(t2.`Project Relieving Date`,CURDATE()))
                WHERE NOT EXISTS( SELECT * FROM `Project Assignment` s2
                    WHERE s2.`Employee` = s1.`Employee`
                        AND s1.`Project Assignment Date` > s2.`Project Assignment Date` 
                        AND s1.`Project Assignment Date` <= IFNULL(s2.`Project Relieving Date`,CURDATE()))
                GROUP BY s1.`Employee`, s1.`Project Assignment Date`
                ORDER BY s1.`Project Assignment Date`) merged
        GROUP BY merged.`Employee`)
SELECT ad.`Employee`,
    ad.assigned_days,
    IFNULL(bs.break_days,0) break_days,
    (ad.assigned_days - IFNULL(bs.break_days,0)) worked_days
FROM assigned ad
LEFT JOIN breaks_summary bs ON ad.`Employee` = bs.`Employee`

已更新 DB-Fiddle 查询:https://www.db-fiddle.com/f/c8fMneAUkhb2P3rzjMtVZm/3

感谢所有通过改进问题和提供可能答案做出贡献的人。