SQL 服务器:为行分配相同的行号,直到满足条件
SQL Server: assign the same row number to rows until a condition is met
我有以下数据集:
ukey id code create_date
1 1082 9053 2018-03-01 23:18:51.0000000
2 1082 9035 2018-03-01 23:19:21.0000000
3 1082 9053 2018-03-01 23:22:55.0000000
4 1082 9535 2018-03-01 23:23:30.0000000
5 1196 3145 2018-03-05 07:27:15.0000000
6 1196 3162 2018-03-05 07:27:50.0000000
7 1196 3175 2018-03-05 07:28:24.0000000
8 1196 3235 2018-03-05 07:28:57.0000000
9 1196 3295 2018-03-05 07:29:31.0000000
10 1196 3448 2018-03-05 07:30:04.0000000
11 1196 3465 2018-03-05 07:30:37.0000000
12 1196 4265 2018-03-05 07:31:09.0000000
13 1196 3495 2018-03-05 17:13:13.0000000
14 473 551 2018-03-02 16:43:52.0000000
15 473 590 2018-03-02 16:44:30.0000000
16 473 835 2018-03-02 16:45:02.0000000
我想要此数据集中的另一列,比如 RN
。 RN
列应具有相同的值,直到满足以下两个条件:
- 如果
id
不同或者,
- 如果当前行和上一行之间的
datediff
超过 60 秒
所以最终的数据集应该是这样的:
ukey id code create_date RN
1 1082 9053 2018-03-01 23:18:51.0000000 1
2 1082 9035 2018-03-01 23:19:21.0000000 1
3 1082 9053 2018-03-01 23:22:55.0000000 1
4 1082 9535 2018-03-01 23:23:30.0000000 1
5 1196 3145 2018-03-05 07:27:15.0000000 2
6 1196 3162 2018-03-05 07:27:50.0000000 2
7 1196 3175 2018-03-05 07:28:24.0000000 2
8 1196 3235 2018-03-05 07:28:57.0000000 2
9 1196 3295 2018-03-05 07:29:31.0000000 2
10 1196 3448 2018-03-05 07:30:04.0000000 2
11 1196 3465 2018-03-05 07:30:37.0000000 2
12 1196 4265 2018-03-05 07:31:09.0000000 2
13 1196 3495 2018-03-05 17:13:13.0000000 3
14 473 551 2018-03-02 16:43:52.0000000 4
15 473 590 2018-03-02 16:44:30.0000000 4
16 473 835 2018-03-02 16:45:02.0000000 4
注意 RN
从 2 到 3 的变化。id
相同,但 datediff
超过 60 秒。
到目前为止我已经尝试过这个查询。我的原始数据集在 dbo.myTable
:
select
a.id as A_ID,
a.code as A_Code,
a.create_date as A_CreateDate,
a.RN as A_RN,
_.id as _ID,
_.code as _Code,
_.create_date as _CreateDate,
_.RN as _RN
from myTable a
cross apply
(
select * from myTable b
where a.id=b.id and (a.ukey=b.ukey-1 or a.ukey=1) and datediff(ss,a.create_date,b.create_date)<60
)_
order by a.id, a.create_date
这个查询没有给我预期的结果。我的想法是将每一行与前一行进行比较并检查这两个条件。
这听起来像是你想要 lag()
和一个累计总和:
select t.*,
sum(case when prev_create_date > dateadd(second, -60, create_date)
then 0 else 1
end) over
(order by ukey) as rn
from (select t.*,
lag(create_date) over (partition by id order by create_date) as prev_create_date
from t
) t;
我有以下数据集:
ukey id code create_date
1 1082 9053 2018-03-01 23:18:51.0000000
2 1082 9035 2018-03-01 23:19:21.0000000
3 1082 9053 2018-03-01 23:22:55.0000000
4 1082 9535 2018-03-01 23:23:30.0000000
5 1196 3145 2018-03-05 07:27:15.0000000
6 1196 3162 2018-03-05 07:27:50.0000000
7 1196 3175 2018-03-05 07:28:24.0000000
8 1196 3235 2018-03-05 07:28:57.0000000
9 1196 3295 2018-03-05 07:29:31.0000000
10 1196 3448 2018-03-05 07:30:04.0000000
11 1196 3465 2018-03-05 07:30:37.0000000
12 1196 4265 2018-03-05 07:31:09.0000000
13 1196 3495 2018-03-05 17:13:13.0000000
14 473 551 2018-03-02 16:43:52.0000000
15 473 590 2018-03-02 16:44:30.0000000
16 473 835 2018-03-02 16:45:02.0000000
我想要此数据集中的另一列,比如 RN
。 RN
列应具有相同的值,直到满足以下两个条件:
- 如果
id
不同或者, - 如果当前行和上一行之间的
datediff
超过 60 秒
所以最终的数据集应该是这样的:
ukey id code create_date RN
1 1082 9053 2018-03-01 23:18:51.0000000 1
2 1082 9035 2018-03-01 23:19:21.0000000 1
3 1082 9053 2018-03-01 23:22:55.0000000 1
4 1082 9535 2018-03-01 23:23:30.0000000 1
5 1196 3145 2018-03-05 07:27:15.0000000 2
6 1196 3162 2018-03-05 07:27:50.0000000 2
7 1196 3175 2018-03-05 07:28:24.0000000 2
8 1196 3235 2018-03-05 07:28:57.0000000 2
9 1196 3295 2018-03-05 07:29:31.0000000 2
10 1196 3448 2018-03-05 07:30:04.0000000 2
11 1196 3465 2018-03-05 07:30:37.0000000 2
12 1196 4265 2018-03-05 07:31:09.0000000 2
13 1196 3495 2018-03-05 17:13:13.0000000 3
14 473 551 2018-03-02 16:43:52.0000000 4
15 473 590 2018-03-02 16:44:30.0000000 4
16 473 835 2018-03-02 16:45:02.0000000 4
注意 RN
从 2 到 3 的变化。id
相同,但 datediff
超过 60 秒。
到目前为止我已经尝试过这个查询。我的原始数据集在 dbo.myTable
:
select
a.id as A_ID,
a.code as A_Code,
a.create_date as A_CreateDate,
a.RN as A_RN,
_.id as _ID,
_.code as _Code,
_.create_date as _CreateDate,
_.RN as _RN
from myTable a
cross apply
(
select * from myTable b
where a.id=b.id and (a.ukey=b.ukey-1 or a.ukey=1) and datediff(ss,a.create_date,b.create_date)<60
)_
order by a.id, a.create_date
这个查询没有给我预期的结果。我的想法是将每一行与前一行进行比较并检查这两个条件。
这听起来像是你想要 lag()
和一个累计总和:
select t.*,
sum(case when prev_create_date > dateadd(second, -60, create_date)
then 0 else 1
end) over
(order by ukey) as rn
from (select t.*,
lag(create_date) over (partition by id order by create_date) as prev_create_date
from t
) t;