查看以获取具有复杂条件的最小日期
View to get the minimum date with a complicated condition
我在 SQL 服务器中有一个 table 是这样的:
+----------+-----------+------------+
| DateFrom | Completed | EmployeeID |
+----------+-----------+------------+
DateFrom: date not null -- unique for each EmployeeID
Completed: bit not null
EmployeeID: bigint not null
- 每一行都属于一个由开始日期定义的子周期,可以完成也可以不完成。
- 每个员工可以有多个子期间。
- 一个周期由一系列有序的子周期定义,直到最后一个子周期结束。
我想创建一个视图,它将 return 每个 EmployeeID 的最后一个周期的开始日期,如下所示:
- 如果没有Completed为真,则获取最小的DateFrom。 [该员工还有一期未完成]
+----------+-----------+------------+
| DateFrom | Completed | EmployeeID |
+----------+-----------+------------+
|2021-01-01| false | 1 |
|2021-01-05| false | 1 |
|2021-01-09| false | 1 |
|2021-01-10| false | 1 |
|2021-01-07| false | 2 |
|2021-01-15| false | 2 |
+----------+-----------+------------+
Expected Result:
2021-01-01 for EmployeeID = 1
2021-01-07 for EmployeeID = 2
- 否则,return 最后一个 Completed 之后的最小 DateFrom 为真。 [最后一期还未完成]
+----------+-----------+------------+
| DateFrom | Completed | EmployeeID |
+----------+-----------+------------+
|2021-01-01| false | 1 |
|2021-01-05| true | 1 |
|2021-01-09| false | 1 |
|2021-01-10| false | 1 |
|2021-01-07| true | 2 |
|2021-01-15| false | 2 |
+----------+-----------+------------+
Expected Result:
2021-01-09 for EmployeeID = 1
2021-01-15 for EmployeeID = 2
- 如果最大 DateFrom has Completed=true,return最后一个 Completed 之前的最小 DateFrom 为 true,在它之前的 true 之后,如果存在。 [最后一期完成多个子期]
+----------+-----------+------------+
| DateFrom | Completed | EmployeeID |
+----------+-----------+------------+
|2021-01-01| false | 1 |
|2021-01-05| true | 1 |
|2021-01-09| false | 1 |
|2021-01-10| true | 1 |
|2021-01-07| false | 2 |
|2021-01-15| true | 2 |
+----------+-----------+------------+
Expected Result:
2021-01-09 for EmployeeID = 1
2021-01-07 for EmployeeID = 2
- 如果最大 DateFrom 已 Completed=true 并且没有其他行或它之前的行已 Completed=true,则 return 最大 DateFrom。 [最后一期用一个子期完成]
+----------+-----------+------------+
| DateFrom | Completed | EmployeeID |
+----------+-----------+------------+
|2021-01-01| false | 1 |
|2021-01-05| false | 1 |
|2021-01-09| true | 1 |
|2021-01-10| true | 1 |
|2021-01-07| true | 2 |
+----------+-----------+------------+
Expected Result:
2021-01-10 for EmployeeID = 1
2021-01-07 for EmployeeID = 2
我正在寻找最优化的解决方案。
我试过了,但是我在第三个例子中得到了 NULL 值:
WITH T AS (
SELECT EmployeeID
, MAX(CASE WHEN Completed = 0 THEN NULL ELSE DateFrom END) MaxDateFrom
FROM TableDates
GROUP BY EmployeeID
)
SELECT TableDates.EmployeeID, MIN(TableDates.DateFrom) DateFrom
FROM T
LEFT JOIN TableDates ON T.EmployeeID = TableDates.EmployeeID
AND (T.MaxDateFrom IS NULL OR TableDates.DateFrom > T.MaxDateFrom)
GROUP BY TableDates.EmployeeID
这是一个有效的查询。它可能过于复杂,但我把简化留给你。
处理3种情况,均按要求按EmployeeId分区,如下:
当不存在Completed=1
时,使用sum(Completed) over()
检测到,则使用first_value(DateFrom)
。
当最后一行值为completed=1
且前一行为completed=0
时,使用last_value(Completed)
和lag(Completed)
进行检测,然后max(case when Completed = 0 then DateFrom else null end)
被使用。
棘手的情况,当 Completed=1
存在并且不是最后一个时。在这种情况下,找到 Completed=1
的最近行的 DateFrom,然后找到比先前检测到的行更新的所有行的 min(DateFrom)
,直到前面的 Completed=1
.
如果最后一行有 completed=1
并且倒数第二行有 completed=1
则使用最后一行的 DateFrom
。如果所有其他选项都为空,Coalesce 会确保这一点。
insert into @Test (EmployeeId, DateFrom, Completed)
values
-- Scenario 1
(1, '2021-01-01', 0),
(1, '2021-01-02', 0),
(1, '2021-01-03', 0),
-- Scenario 2
(2, '2021-01-01', 0),
(2, '2021-01-02', 1),
(2, '2021-01-03', 0),
(2, '2021-01-04', 0),
-- Scenario 3
(3, '2021-01-01', 0),
(3, '2021-01-02', 1),
(3, '2021-01-03', 0),
(3, '2021-01-04', 1),
-- Special case, single row
(4, '2021-01-01', 1),
-- Scenario 4
(5, '2021-01-01', 0),
(5, '2021-01-02', 0),
(5, '2021-01-03', 1);
with cte as (
select *
-- First value of DateFrom over all rows (not the default)
, first_value (DateFrom) over (partition by EmployeeId order by DateFrom asc rows between unbounded preceding and unbounded following) FirstDateFrom
-- Last value of Completed over all rows (not the default)
, last_value (Completed) over (partition by EmployeeId order by DateFrom asc rows between unbounded preceding and unbounded following) LastCompleted
-- Find the Date of the last row with Completed = 1
, max (case when Completed = 1 then DateFrom else null end) over (partition by EmployeeId order by DateFrom asc rows between unbounded preceding and unbounded following) LastCompletedNew
-- Regular row number
, row_number() over (partition by EmployeeId order by DateFrom desc) RowNumber
-- Total number of rows with Completed = 1
, sum(convert(int,Completed)) over (partition by EmployeeId) SumOfCompleted
-- Max value of DateFrom where Completed = 0
, max(case when Completed = 0 then DateFrom else null end) over (partition by EmployeeId order by DateFrom asc rows between unbounded preceding and unbounded following) MaxDateFrom
-- Check the lagged complete to see if the last 2 rows are completed = 1
, lag(Completed) over (partition by EmployeeId order by DateFrom asc) LaggedComplete
-- Borrowed from Gordon to check which rows are prior to the last Completed = 1 and before the preceding Completed = 1
, sum(case when completed = 1 then 1 else 0 end) over (partition by employeeid order by datefrom desc) as completed_seqnum
from @Test
)
select
EmployeeId
-- Use the only DateFrom if there is only one
, coalesce(case
-- Scenario 1
when SumOfCompleted = 0 then FirstDateFrom
when LastCompleted = 1 then
case
-- Scenario 4
when coalesce(LaggedComplete,0) = 1 then DateFrom
-- Scenario 3
else Scenario3
end
-- Scenario 2
else ActualResult
end, DateFrom) FinalResult
--, * -- Uncomment for working
from (
select *
-- Find the lowest DateFrom which is greater then the DateFrom of the last row where Completed = 1
, min(case when DateFrom > LastCompletedNew then DateFrom else null end) over (partition by EmployeeId) ActualResult
-- Find the min DateFrom over the rows between the last Completed=1 and the Completed=1 before it (if it exists)
, min(case when completed_seqnum = 1 then DateFrom else null end) over (partition by EmployeeId order by DateFrom asc rows between unbounded preceding and unbounded following) Scenario3
from cte
) x
-- Because we have calculated the same result for every row we just take the first
where RowNumber = 1
order by x.EmployeeId asc, x.DateFrom asc;
注意:这假设每个日期只有一行。
我认为您只需要有条件的聚合——带有一堆逻辑。假设你每天都有行,我想这就是你想要的:
select employeeid,
(case when -- case 4
min(completed) = max(completed) and
min(completed) = 'true'
then max(datefrom)
when -- case 1
min(completed) = max(completed) and
min(completed) = 'false'
then min(datefrom)
when -- case 3
max(datefrom) = max(case when completed = 'true' then datefrom end)
then min(case when completed_seqnum = 1 then datefrom end)
else dateadd(day, 1, max(case when completed = 'true' then datefrom end))
end)
from (select t.*,
sum(case when completed = 'true' then 1 else 0 end) over (partition by employeeid order by datefrom desc) as completed_seqnum
from t
) t
group by employeeid;
每天需要一行实际上只是为了方便——例如,允许代码添加一天以获取特定“true”false 之后的日期。这也可以在子查询中使用 lead()
来完成。
注意:这不会处理所有条件(至少对于非 NULL 日期。例如,它 returns NULL
当末尾有一系列“真”时的数据。
如果这是一个问题 -- 您的问题的这个版本已经被问到。提出一个 new 问题,并提供适当的示例数据和所需的结果。我还认为您可能能够解释您试图解决的 问题 并简化解释。
编辑:
如果缺少日期,您可以使用:
select employeeid,
(case when -- case 4
min(completed) = max(completed) and
min(completed) = 'true'
then max(datefrom)
when -- case 1
min(completed) = max(completed) and
min(completed) = 'false'
then min(datefrom)
when -- case 3
max(datefrom) = max(case when completed = 'true' then datefrom end)
then min(case when completed_seqnum = 1 then datefrom end)
else max(case when completed = 'true' then next_datefrom end)
end)
from (select t.*,
lead(datefrom) over (partition by employeeid order by datefrom) as next_datefrom,
sum(case when completed = 'true' then 1 else 0 end) over (partition by employeeid order by datefrom desc) as completed_seqnum
from t
) t
group by employeeid;
我在 SQL 服务器中有一个 table 是这样的:
+----------+-----------+------------+
| DateFrom | Completed | EmployeeID |
+----------+-----------+------------+
DateFrom: date not null -- unique for each EmployeeID
Completed: bit not null
EmployeeID: bigint not null
- 每一行都属于一个由开始日期定义的子周期,可以完成也可以不完成。
- 每个员工可以有多个子期间。
- 一个周期由一系列有序的子周期定义,直到最后一个子周期结束。
我想创建一个视图,它将 return 每个 EmployeeID 的最后一个周期的开始日期,如下所示:
- 如果没有Completed为真,则获取最小的DateFrom。 [该员工还有一期未完成]
+----------+-----------+------------+
| DateFrom | Completed | EmployeeID |
+----------+-----------+------------+
|2021-01-01| false | 1 |
|2021-01-05| false | 1 |
|2021-01-09| false | 1 |
|2021-01-10| false | 1 |
|2021-01-07| false | 2 |
|2021-01-15| false | 2 |
+----------+-----------+------------+
Expected Result:
2021-01-01 for EmployeeID = 1
2021-01-07 for EmployeeID = 2
- 否则,return 最后一个 Completed 之后的最小 DateFrom 为真。 [最后一期还未完成]
+----------+-----------+------------+
| DateFrom | Completed | EmployeeID |
+----------+-----------+------------+
|2021-01-01| false | 1 |
|2021-01-05| true | 1 |
|2021-01-09| false | 1 |
|2021-01-10| false | 1 |
|2021-01-07| true | 2 |
|2021-01-15| false | 2 |
+----------+-----------+------------+
Expected Result:
2021-01-09 for EmployeeID = 1
2021-01-15 for EmployeeID = 2
- 如果最大 DateFrom has Completed=true,return最后一个 Completed 之前的最小 DateFrom 为 true,在它之前的 true 之后,如果存在。 [最后一期完成多个子期]
+----------+-----------+------------+
| DateFrom | Completed | EmployeeID |
+----------+-----------+------------+
|2021-01-01| false | 1 |
|2021-01-05| true | 1 |
|2021-01-09| false | 1 |
|2021-01-10| true | 1 |
|2021-01-07| false | 2 |
|2021-01-15| true | 2 |
+----------+-----------+------------+
Expected Result:
2021-01-09 for EmployeeID = 1
2021-01-07 for EmployeeID = 2
- 如果最大 DateFrom 已 Completed=true 并且没有其他行或它之前的行已 Completed=true,则 return 最大 DateFrom。 [最后一期用一个子期完成]
+----------+-----------+------------+
| DateFrom | Completed | EmployeeID |
+----------+-----------+------------+
|2021-01-01| false | 1 |
|2021-01-05| false | 1 |
|2021-01-09| true | 1 |
|2021-01-10| true | 1 |
|2021-01-07| true | 2 |
+----------+-----------+------------+
Expected Result:
2021-01-10 for EmployeeID = 1
2021-01-07 for EmployeeID = 2
我正在寻找最优化的解决方案。
我试过了,但是我在第三个例子中得到了 NULL 值:
WITH T AS (
SELECT EmployeeID
, MAX(CASE WHEN Completed = 0 THEN NULL ELSE DateFrom END) MaxDateFrom
FROM TableDates
GROUP BY EmployeeID
)
SELECT TableDates.EmployeeID, MIN(TableDates.DateFrom) DateFrom
FROM T
LEFT JOIN TableDates ON T.EmployeeID = TableDates.EmployeeID
AND (T.MaxDateFrom IS NULL OR TableDates.DateFrom > T.MaxDateFrom)
GROUP BY TableDates.EmployeeID
这是一个有效的查询。它可能过于复杂,但我把简化留给你。
处理3种情况,均按要求按EmployeeId分区,如下:
当不存在
Completed=1
时,使用sum(Completed) over()
检测到,则使用first_value(DateFrom)
。当最后一行值为
completed=1
且前一行为completed=0
时,使用last_value(Completed)
和lag(Completed)
进行检测,然后max(case when Completed = 0 then DateFrom else null end)
被使用。棘手的情况,当
Completed=1
存在并且不是最后一个时。在这种情况下,找到Completed=1
的最近行的 DateFrom,然后找到比先前检测到的行更新的所有行的min(DateFrom)
,直到前面的Completed=1
.如果最后一行有
completed=1
并且倒数第二行有completed=1
则使用最后一行的DateFrom
。如果所有其他选项都为空,Coalesce 会确保这一点。
insert into @Test (EmployeeId, DateFrom, Completed)
values
-- Scenario 1
(1, '2021-01-01', 0),
(1, '2021-01-02', 0),
(1, '2021-01-03', 0),
-- Scenario 2
(2, '2021-01-01', 0),
(2, '2021-01-02', 1),
(2, '2021-01-03', 0),
(2, '2021-01-04', 0),
-- Scenario 3
(3, '2021-01-01', 0),
(3, '2021-01-02', 1),
(3, '2021-01-03', 0),
(3, '2021-01-04', 1),
-- Special case, single row
(4, '2021-01-01', 1),
-- Scenario 4
(5, '2021-01-01', 0),
(5, '2021-01-02', 0),
(5, '2021-01-03', 1);
with cte as (
select *
-- First value of DateFrom over all rows (not the default)
, first_value (DateFrom) over (partition by EmployeeId order by DateFrom asc rows between unbounded preceding and unbounded following) FirstDateFrom
-- Last value of Completed over all rows (not the default)
, last_value (Completed) over (partition by EmployeeId order by DateFrom asc rows between unbounded preceding and unbounded following) LastCompleted
-- Find the Date of the last row with Completed = 1
, max (case when Completed = 1 then DateFrom else null end) over (partition by EmployeeId order by DateFrom asc rows between unbounded preceding and unbounded following) LastCompletedNew
-- Regular row number
, row_number() over (partition by EmployeeId order by DateFrom desc) RowNumber
-- Total number of rows with Completed = 1
, sum(convert(int,Completed)) over (partition by EmployeeId) SumOfCompleted
-- Max value of DateFrom where Completed = 0
, max(case when Completed = 0 then DateFrom else null end) over (partition by EmployeeId order by DateFrom asc rows between unbounded preceding and unbounded following) MaxDateFrom
-- Check the lagged complete to see if the last 2 rows are completed = 1
, lag(Completed) over (partition by EmployeeId order by DateFrom asc) LaggedComplete
-- Borrowed from Gordon to check which rows are prior to the last Completed = 1 and before the preceding Completed = 1
, sum(case when completed = 1 then 1 else 0 end) over (partition by employeeid order by datefrom desc) as completed_seqnum
from @Test
)
select
EmployeeId
-- Use the only DateFrom if there is only one
, coalesce(case
-- Scenario 1
when SumOfCompleted = 0 then FirstDateFrom
when LastCompleted = 1 then
case
-- Scenario 4
when coalesce(LaggedComplete,0) = 1 then DateFrom
-- Scenario 3
else Scenario3
end
-- Scenario 2
else ActualResult
end, DateFrom) FinalResult
--, * -- Uncomment for working
from (
select *
-- Find the lowest DateFrom which is greater then the DateFrom of the last row where Completed = 1
, min(case when DateFrom > LastCompletedNew then DateFrom else null end) over (partition by EmployeeId) ActualResult
-- Find the min DateFrom over the rows between the last Completed=1 and the Completed=1 before it (if it exists)
, min(case when completed_seqnum = 1 then DateFrom else null end) over (partition by EmployeeId order by DateFrom asc rows between unbounded preceding and unbounded following) Scenario3
from cte
) x
-- Because we have calculated the same result for every row we just take the first
where RowNumber = 1
order by x.EmployeeId asc, x.DateFrom asc;
注意:这假设每个日期只有一行。
我认为您只需要有条件的聚合——带有一堆逻辑。假设你每天都有行,我想这就是你想要的:
select employeeid,
(case when -- case 4
min(completed) = max(completed) and
min(completed) = 'true'
then max(datefrom)
when -- case 1
min(completed) = max(completed) and
min(completed) = 'false'
then min(datefrom)
when -- case 3
max(datefrom) = max(case when completed = 'true' then datefrom end)
then min(case when completed_seqnum = 1 then datefrom end)
else dateadd(day, 1, max(case when completed = 'true' then datefrom end))
end)
from (select t.*,
sum(case when completed = 'true' then 1 else 0 end) over (partition by employeeid order by datefrom desc) as completed_seqnum
from t
) t
group by employeeid;
每天需要一行实际上只是为了方便——例如,允许代码添加一天以获取特定“true”false 之后的日期。这也可以在子查询中使用 lead()
来完成。
注意:这不会处理所有条件(至少对于非 NULL 日期。例如,它 returns NULL
当末尾有一系列“真”时的数据。
如果这是一个问题 -- 您的问题的这个版本已经被问到。提出一个 new 问题,并提供适当的示例数据和所需的结果。我还认为您可能能够解释您试图解决的 问题 并简化解释。
编辑:
如果缺少日期,您可以使用:
select employeeid,
(case when -- case 4
min(completed) = max(completed) and
min(completed) = 'true'
then max(datefrom)
when -- case 1
min(completed) = max(completed) and
min(completed) = 'false'
then min(datefrom)
when -- case 3
max(datefrom) = max(case when completed = 'true' then datefrom end)
then min(case when completed_seqnum = 1 then datefrom end)
else max(case when completed = 'true' then next_datefrom end)
end)
from (select t.*,
lead(datefrom) over (partition by employeeid order by datefrom) as next_datefrom,
sum(case when completed = 'true' then 1 else 0 end) over (partition by employeeid order by datefrom desc) as completed_seqnum
from t
) t
group by employeeid;