基于条件的数量
Number based on condition
我正在尝试根据条件生成一个数字。
当 Start_Date 排序的 Client 分区中 'Stop' 列为 yes 时,Dense Rank 必须重新开始。所以我尝试了几件事,但它仍然不是我想要的。
我的 table 当前号码和预期号码
+-----------+------------+------+------------+-------------+
| Client_No | Start_Date | Stop | Current_No | Expected_No |
+-----------+------------+------+------------+-------------+
| 1 | 1-1-2018 | No | 1 | 1 |
+-----------+------------+------+------------+-------------+
| 1 | 1-2-2018 | No | 2 | 2 |
+-----------+------------+------+------------+-------------+
| 1 | 1-3-2018 | No | 3 | 3 |
+-----------+------------+------+------------+-------------+
| 1 | 1-4-2018 | Yes | 1 | 1 |
+-----------+------------+------+------------+-------------+
| 1 | 1-5-2018 | No | 4 | 2 |
+-----------+------------+------+------------+-------------+
| 1 | 1-6-2018 | No | 5 | 3 |
+-----------+------------+------+------------+-------------+
| 2 | 1-2-2018 | No | 1 | 1 |
+-----------+------------+------+------------+-------------+
| 2 | 1-3-2018 | No | 2 | 2 |
+-----------+------------+------+------------+-------------+
| 2 | 1-4-2018 | Yes | 1 | 1 |
+-----------+------------+------+------------+-------------+
| 2 | 1-5-2018 | No | 3 | 2 |
+-----------+------------+------+------------+-------------+
| 2 | 1-6-2018 | Yes | 2 | 1 |
+-----------+------------+------+------------+-------------+
我目前使用的查询:
DENSE_RANK() OVER(PARTITION BY Client_No, Stop ORDER BY Start_Date ASC)
这似乎不是解决方案,因为它从值 'no' 算起,但我不知道如何用另一种方式处理它。
解决此类 Gaps-And-Islands 难题的一种方法是首先计算以 'Yes' 停止点开始的排名。
然后计算 row_number 或 dense_rank 也在该排名之上。
例如:
create table test
(
Id int identity(1,1) primary key,
Client_No int,
Start_Date date,
Stop varchar(3)
)
insert into test
(Client_No, Start_Date, Stop) values
(1,'2018-01-01','No')
,(1,'2018-02-01','No')
,(1,'2018-03-01','No')
,(1,'2018-04-01','Yes')
,(1,'2018-05-01','No')
,(1,'2018-06-01','No')
,(2,'2018-02-01','No')
,(2,'2018-03-01','No')
,(2,'2018-04-01','Yes')
,(2,'2018-05-01','No')
,(2,'2018-06-01','Yes')
select *
, row_number() over (partition by Client_no, Rnk order by start_date) as rn
from
(
select *
, sum(case when Stop = 'Yes' then 1 else 0 end) over (partition by Client_No order by start_date) rnk
from test
) q
order by Client_No, start_date
GO
Id | Client_No | Start_Date | Stop | rnk | rn
-: | --------: | :------------------ | :--- | --: | :-
1 | 1 | 01/01/2018 00:00:00 | No | 0 | 1
2 | 1 | 01/02/2018 00:00:00 | No | 0 | 2
3 | 1 | 01/03/2018 00:00:00 | No | 0 | 3
4 | 1 | 01/04/2018 00:00:00 | Yes | 1 | 1
5 | 1 | 01/05/2018 00:00:00 | No | 1 | 2
6 | 1 | 01/06/2018 00:00:00 | No | 1 | 3
7 | 2 | 01/02/2018 00:00:00 | No | 0 | 1
8 | 2 | 01/03/2018 00:00:00 | No | 0 | 2
9 | 2 | 01/04/2018 00:00:00 | Yes | 1 | 1
10 | 2 | 01/05/2018 00:00:00 | No | 1 | 2
11 | 2 | 01/06/2018 00:00:00 | Yes | 2 | 1
db<>fiddle here
使用这个的区别:
row_number() over (partition by Client_no, Rnk order by start_date)
与此相对:
dense_rank() over (partition by Client_no, Rnk order by start_date)
是 dense_rank 会为每个 Client_no & Rnk 计算相同的 start_date 相同的数字。
下面是一种可以为您提供所需输出的方法。您可以看到 live/working 演示 here.
涉及的步骤是:
- 创建一个调整后的停止值,我们为每个客户的第一行将“停止”标记为“是”
- 创建一个单独的 table,它只包含我们要 start/restart 计数的行
- 对于这个新 table 中的每一行,我们还添加了一个结束数据,它基本上是每个客户的下一行的日期,或者最后一行是未来的日期
- 我们将原始数据 table 与新的 table 和 运行 基于此新计算的序列相结合
-- 1. Creating adjusted stop value
data_adjusted_stop as
(
select *,
case when row_number() over(partition by Client_No order by Start_Date asc) = 1 then 'Yes' else Stop end as adjusted_stop
from data
),
-- 2. Extracting the rows where we will want to (re)start the counting
data_with_cycle as
(
select Client_No,
row_number() over(partition by Client_No order by Start_Date asc) adjusted_stop_cycle,
Start_Date
from data_adjusted_stop
where adjusted_stop = 'Yes'
),
-- 3. Adding an End_Date column for each row where we will want to (re)start counting
data_with_end_date as
(
select *,
coalesce(lead(Start_Date) over (partition by Client_No order by Start_Date asc), '2021-01-01') as End_Date
from data_with_cycle
)
-- 4. Running a sequence partitioned by Client_No and the stop cycle
select data.*,
row_number() over(partition by data.Client_No, data_with_end_date.adjusted_stop_cycle order by data.Start_Date asc) as desired_output_sequence
from data
left join data_with_end_date
on data_with_end_date.Client_no = data.Client_no
where data.Start_Date >= data_with_end_date.Start_Date
and data.Start_Date < data_with_end_date.End_Date
order by 1, 2
我正在尝试根据条件生成一个数字。 当 Start_Date 排序的 Client 分区中 'Stop' 列为 yes 时,Dense Rank 必须重新开始。所以我尝试了几件事,但它仍然不是我想要的。 我的 table 当前号码和预期号码
+-----------+------------+------+------------+-------------+
| Client_No | Start_Date | Stop | Current_No | Expected_No |
+-----------+------------+------+------------+-------------+
| 1 | 1-1-2018 | No | 1 | 1 |
+-----------+------------+------+------------+-------------+
| 1 | 1-2-2018 | No | 2 | 2 |
+-----------+------------+------+------------+-------------+
| 1 | 1-3-2018 | No | 3 | 3 |
+-----------+------------+------+------------+-------------+
| 1 | 1-4-2018 | Yes | 1 | 1 |
+-----------+------------+------+------------+-------------+
| 1 | 1-5-2018 | No | 4 | 2 |
+-----------+------------+------+------------+-------------+
| 1 | 1-6-2018 | No | 5 | 3 |
+-----------+------------+------+------------+-------------+
| 2 | 1-2-2018 | No | 1 | 1 |
+-----------+------------+------+------------+-------------+
| 2 | 1-3-2018 | No | 2 | 2 |
+-----------+------------+------+------------+-------------+
| 2 | 1-4-2018 | Yes | 1 | 1 |
+-----------+------------+------+------------+-------------+
| 2 | 1-5-2018 | No | 3 | 2 |
+-----------+------------+------+------------+-------------+
| 2 | 1-6-2018 | Yes | 2 | 1 |
+-----------+------------+------+------------+-------------+
我目前使用的查询:
DENSE_RANK() OVER(PARTITION BY Client_No, Stop ORDER BY Start_Date ASC)
这似乎不是解决方案,因为它从值 'no' 算起,但我不知道如何用另一种方式处理它。
解决此类 Gaps-And-Islands 难题的一种方法是首先计算以 'Yes' 停止点开始的排名。
然后计算 row_number 或 dense_rank 也在该排名之上。
例如:
create table test ( Id int identity(1,1) primary key, Client_No int, Start_Date date, Stop varchar(3) )
insert into test (Client_No, Start_Date, Stop) values (1,'2018-01-01','No') ,(1,'2018-02-01','No') ,(1,'2018-03-01','No') ,(1,'2018-04-01','Yes') ,(1,'2018-05-01','No') ,(1,'2018-06-01','No') ,(2,'2018-02-01','No') ,(2,'2018-03-01','No') ,(2,'2018-04-01','Yes') ,(2,'2018-05-01','No') ,(2,'2018-06-01','Yes')
select * , row_number() over (partition by Client_no, Rnk order by start_date) as rn from ( select * , sum(case when Stop = 'Yes' then 1 else 0 end) over (partition by Client_No order by start_date) rnk from test ) q order by Client_No, start_date GO
Id | Client_No | Start_Date | Stop | rnk | rn -: | --------: | :------------------ | :--- | --: | :- 1 | 1 | 01/01/2018 00:00:00 | No | 0 | 1 2 | 1 | 01/02/2018 00:00:00 | No | 0 | 2 3 | 1 | 01/03/2018 00:00:00 | No | 0 | 3 4 | 1 | 01/04/2018 00:00:00 | Yes | 1 | 1 5 | 1 | 01/05/2018 00:00:00 | No | 1 | 2 6 | 1 | 01/06/2018 00:00:00 | No | 1 | 3 7 | 2 | 01/02/2018 00:00:00 | No | 0 | 1 8 | 2 | 01/03/2018 00:00:00 | No | 0 | 2 9 | 2 | 01/04/2018 00:00:00 | Yes | 1 | 1 10 | 2 | 01/05/2018 00:00:00 | No | 1 | 2 11 | 2 | 01/06/2018 00:00:00 | Yes | 2 | 1
db<>fiddle here
使用这个的区别:
row_number() over (partition by Client_no, Rnk order by start_date)
与此相对:
dense_rank() over (partition by Client_no, Rnk order by start_date)
是 dense_rank 会为每个 Client_no & Rnk 计算相同的 start_date 相同的数字。
下面是一种可以为您提供所需输出的方法。您可以看到 live/working 演示 here.
涉及的步骤是:
- 创建一个调整后的停止值,我们为每个客户的第一行将“停止”标记为“是”
- 创建一个单独的 table,它只包含我们要 start/restart 计数的行
- 对于这个新 table 中的每一行,我们还添加了一个结束数据,它基本上是每个客户的下一行的日期,或者最后一行是未来的日期
- 我们将原始数据 table 与新的 table 和 运行 基于此新计算的序列相结合
-- 1. Creating adjusted stop value
data_adjusted_stop as
(
select *,
case when row_number() over(partition by Client_No order by Start_Date asc) = 1 then 'Yes' else Stop end as adjusted_stop
from data
),
-- 2. Extracting the rows where we will want to (re)start the counting
data_with_cycle as
(
select Client_No,
row_number() over(partition by Client_No order by Start_Date asc) adjusted_stop_cycle,
Start_Date
from data_adjusted_stop
where adjusted_stop = 'Yes'
),
-- 3. Adding an End_Date column for each row where we will want to (re)start counting
data_with_end_date as
(
select *,
coalesce(lead(Start_Date) over (partition by Client_No order by Start_Date asc), '2021-01-01') as End_Date
from data_with_cycle
)
-- 4. Running a sequence partitioned by Client_No and the stop cycle
select data.*,
row_number() over(partition by data.Client_No, data_with_end_date.adjusted_stop_cycle order by data.Start_Date asc) as desired_output_sequence
from data
left join data_with_end_date
on data_with_end_date.Client_no = data.Client_no
where data.Start_Date >= data_with_end_date.Start_Date
and data.Start_Date < data_with_end_date.End_Date
order by 1, 2