基于条件的数量

Question

我正在尝试根据条件生成一个数字。当 Start_Date 排序的 Client 分区中 'Stop' 列为 yes 时，Dense Rank 必须重新开始。所以我尝试了几件事，但它仍然不是我想要的。我的 table 当前号码和预期号码

+-----------+------------+------+------------+-------------+
| Client_No | Start_Date | Stop | Current_No | Expected_No |
+-----------+------------+------+------------+-------------+
|     1     |  1-1-2018  |  No  |      1     |      1      |
+-----------+------------+------+------------+-------------+
|     1     |  1-2-2018  |  No  |      2     |      2      |
+-----------+------------+------+------------+-------------+
|     1     |  1-3-2018  |  No  |      3     |      3      |
+-----------+------------+------+------------+-------------+
|     1     |  1-4-2018  |  Yes |      1     |      1      |
+-----------+------------+------+------------+-------------+
|     1     |  1-5-2018  |  No  |      4     |      2      |
+-----------+------------+------+------------+-------------+
|     1     |  1-6-2018  |  No  |      5     |      3      |
+-----------+------------+------+------------+-------------+
|     2     |  1-2-2018  |  No  |      1     |      1      |
+-----------+------------+------+------------+-------------+
|     2     |  1-3-2018  |  No  |      2     |      2      |
+-----------+------------+------+------------+-------------+
|     2     |  1-4-2018  |  Yes |      1     |      1      |
+-----------+------------+------+------------+-------------+
|     2     |  1-5-2018  |  No  |      3     |      2      |
+-----------+------------+------+------------+-------------+
|     2     |  1-6-2018  |  Yes |      2     |      1      |
+-----------+------------+------+------------+-------------+

我目前使用的查询：

DENSE_RANK() OVER(PARTITION BY Client_No, Stop ORDER BY Start_Date ASC)

这似乎不是解决方案，因为它从值 'no' 算起，但我不知道如何用另一种方式处理它。

Answer 1

解决此类 Gaps-And-Islands 难题的一种方法是首先计算以 'Yes' 停止点开始的排名。

然后计算 row_number 或 dense_rank 也在该排名之上。

例如：

create table test 
(
  Id int identity(1,1) primary key,
  Client_No int,
  Start_Date date,
  Stop varchar(3)
)

insert into test 
(Client_No, Start_Date, Stop) values
  (1,'2018-01-01','No')
 ,(1,'2018-02-01','No')
 ,(1,'2018-03-01','No')
 ,(1,'2018-04-01','Yes')
 ,(1,'2018-05-01','No')
 ,(1,'2018-06-01','No')

 ,(2,'2018-02-01','No')
 ,(2,'2018-03-01','No')
 ,(2,'2018-04-01','Yes')
 ,(2,'2018-05-01','No')
 ,(2,'2018-06-01','Yes')

select *
, row_number() over (partition by Client_no, Rnk order by start_date) as rn
from
(
  select *
  , sum(case when Stop = 'Yes' then 1 else 0 end) over (partition by Client_No order by start_date) rnk
  from test
) q
order by Client_No, start_date
GO

Id | Client_No | Start_Date          | Stop | rnk | rn
-: | --------: | :------------------ | :--- | --: | :-
 1 |         1 | 01/01/2018 00:00:00 | No   |   0 | 1 
 2 |         1 | 01/02/2018 00:00:00 | No   |   0 | 2 
 3 |         1 | 01/03/2018 00:00:00 | No   |   0 | 3 
 4 |         1 | 01/04/2018 00:00:00 | Yes  |   1 | 1 
 5 |         1 | 01/05/2018 00:00:00 | No   |   1 | 2 
 6 |         1 | 01/06/2018 00:00:00 | No   |   1 | 3 
 7 |         2 | 01/02/2018 00:00:00 | No   |   0 | 1 
 8 |         2 | 01/03/2018 00:00:00 | No   |   0 | 2 
 9 |         2 | 01/04/2018 00:00:00 | Yes  |   1 | 1 
10 |         2 | 01/05/2018 00:00:00 | No   |   1 | 2 
11 |         2 | 01/06/2018 00:00:00 | Yes  |   2 | 1

db<>fiddle here

使用这个的区别：

row_number() over (partition by Client_no, Rnk order by start_date)

与此相对：

dense_rank() over (partition by Client_no, Rnk order by start_date)

是 dense_rank 会为每个 Client_no & Rnk 计算相同的 start_date 相同的数字。

Answer 2

下面是一种可以为您提供所需输出的方法。您可以看到 live/working 演示 here.

涉及的步骤是：

创建一个调整后的停止值，我们为每个客户的第一行将“停止”标记为“是”
创建一个单独的 table，它只包含我们要 start/restart 计数的行
对于这个新 table 中的每一行，我们还添加了一个结束数据，它基本上是每个客户的下一行的日期，或者最后一行是未来的日期
我们将原始数据 table 与新的 table 和运行基于此新计算的序列相结合

-- 1. Creating adjusted stop value
data_adjusted_stop as
(
select      *,
            case when row_number() over(partition by Client_No order by Start_Date asc) = 1 then 'Yes' else Stop end as adjusted_stop
from        data
),

-- 2. Extracting the rows where we will want to (re)start the counting
data_with_cycle as
(
select      Client_No,
            row_number() over(partition by Client_No order by Start_Date asc) adjusted_stop_cycle,
            Start_Date
from        data_adjusted_stop
where       adjusted_stop = 'Yes'
),

-- 3. Adding an End_Date column for each row where we will want to (re)start counting
data_with_end_date as
(
select      *,
            coalesce(lead(Start_Date) over (partition by Client_No order by Start_Date asc), '2021-01-01') as End_Date
from        data_with_cycle
)

-- 4. Running a sequence partitioned by Client_No and the stop cycle
select      data.*,
            row_number() over(partition by data.Client_No,      data_with_end_date.adjusted_stop_cycle order by data.Start_Date asc) as desired_output_sequence
from        data
left join   data_with_end_date
            on data_with_end_date.Client_no = data.Client_no
where       data.Start_Date >= data_with_end_date.Start_Date
and         data.Start_Date < data_with_end_date.End_Date 
order by    1, 2

基于条件的数量

Number based on condition

sql

tsql

sql-server

gaps-and-islands