连续日期的组岛,包括缺少周末
Group islands of contiguous dates, including missing weekends
我有一个大型数据集,其中包含某些操作的日期,我正在尝试计算连续的日期。四处搜索,我发现了这个:https://www.sqlservercentral.com/articles/group-islands-of-contiguous-dates-sql-spackle,它近乎完美,它正在做我正在寻找的事情。不幸的是,由于我的数据集,我有一个需要查询执行的异常业务规则:如果员工的最后日期是星期五,而下一个开始日期是最近的星期一,它应该将这些日期分组到同一个“岛”在不增加天数的情况下。这就是我对示例数据集的意思:
CREATE TABLE Actions
([Employee] varchar(2), [ActionDate] date)
;
INSERT INTO Actions
([Employee], [ActionDate])
VALUES
('AA', '2019-01-03'),
('AA', '2019-01-04'),
('AA', '2019-01-07'),
('AA', '2019-01-08'),
('BB', '2019-08-01'),
('BB', '2019-08-02'),
('BB', '2019-08-03'),
('BB', '2019-08-04'),
('BB', '2019-08-05'),
('BB', '2019-08-06'),
('CC', '2019-09-09'),
('CC', '2019-09-10'),
('CC', '2019-09-11'),
('CC', '2019-09-12'),
('CC', '2019-09-13'),
('CC', '2019-09-16'),
('CC', '2019-09-17'),
('CC', '2019-09-18')
;
我找到的查询更改了列以匹配示例:
WITH
days As
(
SELECT Employee,
ActionDate,
DATEADD(dd, -ROW_NUMBER() OVER (PARTITION BY Employee ORDER BY Employee, ActionDate), ActionDate) As grouping
FROM Actions
GROUP BY Employee, ActionDate
)
SELECT Employee,
MIN(ActionDate) AS ActionStart,
MAX(ActionDate) As ActionEnd,
DATEDIFF(dd,MIN(ActionDate),MAX(ActionDate))+1 As ActLength
FROM days
GROUP BY Employee, grouping
ORDER BY Employee, ActionStart
结果是:
+-------+----------+-------------+------------+-----------+
| RowNr | Employee | ActionStart | ActionEnd | ActLength |
+-------+----------+-------------+------------+-----------+
| 1 | AA | 03.01.2019 | 04.01.2019 | 2 |
| 2 | AA | 07.01.2019 | 08.01.2019 | 2 |
| 3 | BB | 01.08.2019 | 06.08.2019 | 6 |
| 4 | CC | 09.09.2019 | 13.09.2019 | 5 |
| 5 | CC | 16.09.2019 | 18.09.2019 | 3 |
+-------+----------+-------------+------------+-----------+
在此示例中,员工 AA 的结束日期为 4.1.2019 星期五,7.1.2019 的开始日期是最近的星期一。 CC 还有一个结束日期是 2019 年 9 月 13 日星期五,下一个开始日期是最近的 2019 年 9 月 16 日星期一。它应该在不增加 ActLength 的情况下“合并”这些日期。所以期望的结果是:
+-------+----------+-------------+------------+-----------+
| RowNr | Employee | ActionStart | ActionEnd | ActLength |
+-------+----------+-------------+------------+-----------+
| 1 | AA | 03.01.2019 | 08.01.2019 | 4 |
| 2 | BB | 01.08.2019 | 06.08.2019 | 6 |
| 3 | CC | 09.09.2019 | 18.09.2019 | 8 |
+-------+----------+-------------+------------+-----------+
有谁知道可以为这种 SQL 查询创建这样的规则吗?我试着环顾四周,通常人们想排除周末。非常感谢大家。
我发现使用 lag()
和 window 总和来实现您想要的逻辑更容易:
select employee, min(actionDate) actionStart, max(actionDate) actionEnd, count(*) actionLength
from (
select
a.*, sum(
case when actionDate = dateadd(day, 1, lagActionDate)
or (actionDate = dateadd(day, 3, lagActionDate) and datename(weekday, actionDate) = 'Monday')
then 0 else 1 end
) over(partition by employee order by actionDate) grp
from (
select
a.*,
lag(actionDate) over(partition by employee order by actionDate) lagActionDate
from actions a
) a
) a
group by employee, grp
employee | actionStart | actionEnd | actionLength
:------- | :---------- | :--------- | -----------:
AA | 2019-01-03 | 2019-01-08 | 4
BB | 2019-08-01 | 2019-08-06 | 6
CC | 2019-09-09 | 2019-09-18 | 8
我有一个大型数据集,其中包含某些操作的日期,我正在尝试计算连续的日期。四处搜索,我发现了这个:https://www.sqlservercentral.com/articles/group-islands-of-contiguous-dates-sql-spackle,它近乎完美,它正在做我正在寻找的事情。不幸的是,由于我的数据集,我有一个需要查询执行的异常业务规则:如果员工的最后日期是星期五,而下一个开始日期是最近的星期一,它应该将这些日期分组到同一个“岛”在不增加天数的情况下。这就是我对示例数据集的意思:
CREATE TABLE Actions
([Employee] varchar(2), [ActionDate] date)
;
INSERT INTO Actions
([Employee], [ActionDate])
VALUES
('AA', '2019-01-03'),
('AA', '2019-01-04'),
('AA', '2019-01-07'),
('AA', '2019-01-08'),
('BB', '2019-08-01'),
('BB', '2019-08-02'),
('BB', '2019-08-03'),
('BB', '2019-08-04'),
('BB', '2019-08-05'),
('BB', '2019-08-06'),
('CC', '2019-09-09'),
('CC', '2019-09-10'),
('CC', '2019-09-11'),
('CC', '2019-09-12'),
('CC', '2019-09-13'),
('CC', '2019-09-16'),
('CC', '2019-09-17'),
('CC', '2019-09-18')
;
我找到的查询更改了列以匹配示例:
WITH
days As
(
SELECT Employee,
ActionDate,
DATEADD(dd, -ROW_NUMBER() OVER (PARTITION BY Employee ORDER BY Employee, ActionDate), ActionDate) As grouping
FROM Actions
GROUP BY Employee, ActionDate
)
SELECT Employee,
MIN(ActionDate) AS ActionStart,
MAX(ActionDate) As ActionEnd,
DATEDIFF(dd,MIN(ActionDate),MAX(ActionDate))+1 As ActLength
FROM days
GROUP BY Employee, grouping
ORDER BY Employee, ActionStart
结果是:
+-------+----------+-------------+------------+-----------+
| RowNr | Employee | ActionStart | ActionEnd | ActLength |
+-------+----------+-------------+------------+-----------+
| 1 | AA | 03.01.2019 | 04.01.2019 | 2 |
| 2 | AA | 07.01.2019 | 08.01.2019 | 2 |
| 3 | BB | 01.08.2019 | 06.08.2019 | 6 |
| 4 | CC | 09.09.2019 | 13.09.2019 | 5 |
| 5 | CC | 16.09.2019 | 18.09.2019 | 3 |
+-------+----------+-------------+------------+-----------+
在此示例中,员工 AA 的结束日期为 4.1.2019 星期五,7.1.2019 的开始日期是最近的星期一。 CC 还有一个结束日期是 2019 年 9 月 13 日星期五,下一个开始日期是最近的 2019 年 9 月 16 日星期一。它应该在不增加 ActLength 的情况下“合并”这些日期。所以期望的结果是:
+-------+----------+-------------+------------+-----------+
| RowNr | Employee | ActionStart | ActionEnd | ActLength |
+-------+----------+-------------+------------+-----------+
| 1 | AA | 03.01.2019 | 08.01.2019 | 4 |
| 2 | BB | 01.08.2019 | 06.08.2019 | 6 |
| 3 | CC | 09.09.2019 | 18.09.2019 | 8 |
+-------+----------+-------------+------------+-----------+
有谁知道可以为这种 SQL 查询创建这样的规则吗?我试着环顾四周,通常人们想排除周末。非常感谢大家。
我发现使用 lag()
和 window 总和来实现您想要的逻辑更容易:
select employee, min(actionDate) actionStart, max(actionDate) actionEnd, count(*) actionLength
from (
select
a.*, sum(
case when actionDate = dateadd(day, 1, lagActionDate)
or (actionDate = dateadd(day, 3, lagActionDate) and datename(weekday, actionDate) = 'Monday')
then 0 else 1 end
) over(partition by employee order by actionDate) grp
from (
select
a.*,
lag(actionDate) over(partition by employee order by actionDate) lagActionDate
from actions a
) a
) a
group by employee, grp
employee | actionStart | actionEnd | actionLength :------- | :---------- | :--------- | -----------: AA | 2019-01-03 | 2019-01-08 | 4 BB | 2019-08-01 | 2019-08-06 | 6 CC | 2019-09-09 | 2019-09-18 | 8