为具有重复值的行分配序号
Assign sequential numbers to rows with repeating values
我有以下 table:
ITEM DATE VALUE
----------------------
ITEM1 2016-05-04 1
ITEM1 2016-05-05 3
ITEM1 2016-05-06 3
ITEM1 2016-05-09 3
ITEM1 2016-05-04 4
ITEM2 2016-05-10 1
ITEM2 2016-05-05 2
ITEM2 2016-05-06 3
ITEM2 2016-05-09 1
ITEM2 2016-05-10 1
我想知道每个项目有多少个条目在时间上是相同的(持平):
ITEM DATE VALUE NUM_FLAT_ENTRYPOINTS
------------------------------
ITEM1 2016-05-04 1 0
ITEM1 2016-05-05 3 0
ITEM1 2016-05-06 3 1
ITEM1 2016-05-09 3 2
ITEM1 2016-05-10 4 0
ITEM2 2016-05-04 1 0
ITEM2 2016-05-05 2 0
ITEM2 2016-05-06 3 0
ITEM2 2016-05-09 1 0
ITEM2 2016-05-10 1 1
我最初的想法是:
select
*,
rank()-1 over (partition by ITEM,VALUE order by DATE) as NUM_FLAT_ENTRYPOINTS
from my_table
但是,这不起作用,因为 ITEM2 会将 2016-05-04、2016-05-09 和 2016-05-10 一起划分,并在最后一行的 NUM_FLAT_ENTRYPOINTS 中显示 2 而不是 1 .
我正在使用 Microsoft SQL Server 2008。
有什么想法吗?
编辑:
在 Oracle(以及可能的其他 SQL 服务器)中,我似乎可以做到
select
count(VALUE)-1 over (partition by ITEM,VALUE order by DATE) as NUM_FLAT_ENTRYPOINTS
from my_table
但据我所知,此语法在 SQL Server 2008 中不起作用。有什么解决方法吗?
假设对我在评论中建议的示例数据进行更正,这似乎符合要求:
declare @t table (ITEM char(5), Date date, Value tinyint)
insert into @t(ITEM,DATE,VALUE) values
('ITEM1','20160504',1),
('ITEM1','20160505',3),
('ITEM1','20160506',3),
('ITEM1','20160509',3),
('ITEM1','20160510',4),
('ITEM2','20160504',1),
('ITEM2','20160505',2),
('ITEM2','20160506',3),
('ITEM2','20160509',1),
('ITEM2','20160510',1)
;With Ordered as (
select
Item,
Date,
Value,
ROW_NUMBER() OVER (PARTITION BY Item ORDER BY Date) as rn
from @t
)
select
*,
COALESCE(rn -
(select MAX(o2.rn) from Ordered o2
where o2.ITEM = o.ITEM and
o2.rn < o.rn and
o2.Value != o.Value) - 1
, o.rn - 1) as NUM_FLAT_ENTRYPOINTS
from
Ordered o
也就是我们给行号赋值(每一项单独赋值),然后我们就简单的在Value
不同的地方找到比当前行号早的最新行号。减去这些行号(以及进一步的 1)产生我们需要的答案——假设可以找到这样一个更早的行。如果没有这样的较早行,那么我们显然处于特定项目开头的序列中 - 因此我们只需从行号中减去 1。
我在这里 "obviously correct" - 可能有一种方法可以产生更好的结果,但我现在不打算这样做。
结果:
Item Date Value rn NUM_FLAT_ENTRYPOINTS
----- ---------- ----- -------------------- --------------------
ITEM1 2016-05-04 1 1 0
ITEM1 2016-05-05 3 2 0
ITEM1 2016-05-06 3 3 1
ITEM1 2016-05-09 3 4 2
ITEM1 2016-05-10 4 5 0
ITEM2 2016-05-04 1 1 0
ITEM2 2016-05-05 2 2 0
ITEM2 2016-05-06 3 3 0
ITEM2 2016-05-09 1 4 0
ITEM2 2016-05-10 1 5 1
它看起来像是间隙和岛屿的变体。
示例数据
DECLARE @T TABLE (ITEM varchar(50), dt date, VALUE int);
INSERT INTO @T(ITEM, dt, VALUE) VALUES
('ITEM1', '2016-05-04', 1),
('ITEM1', '2016-05-05', 3),
('ITEM1', '2016-05-06', 3),
('ITEM1', '2016-05-09', 3),
('ITEM1', '2016-05-10', 4),
('ITEM2', '2016-05-04', 1),
('ITEM2', '2016-05-05', 2),
('ITEM2', '2016-05-06', 3),
('ITEM2', '2016-05-09', 1),
('ITEM2', '2016-05-10', 1);
查询
WITH
CTE
AS
(
SELECT
ITEM
,dt
,VALUE
,ROW_NUMBER() OVER (PARTITION BY ITEM ORDER BY dt) AS rn1
,ROW_NUMBER() OVER (PARTITION BY ITEM, VALUE ORDER BY dt) AS rn2
FROM @T
)
SELECT
ITEM
,dt
,VALUE
,rn1-rn2 AS rnDiff
,ROW_NUMBER() OVER
(PARTITION BY ITEM, VALUE, rn1-rn2 ORDER BY dt) - 1 AS NUM_FLAT_ENTRYPOINTS
FROM CTE
ORDER BY ITEM, dt;
结果
+-------+------------+-------+--------+----------------------+
| ITEM | dt | VALUE | rnDiff | NUM_FLAT_ENTRYPOINTS |
+-------+------------+-------+--------+----------------------+
| ITEM1 | 2016-05-04 | 1 | 0 | 0 |
| ITEM1 | 2016-05-05 | 3 | 1 | 0 |
| ITEM1 | 2016-05-06 | 3 | 1 | 1 |
| ITEM1 | 2016-05-09 | 3 | 1 | 2 |
| ITEM1 | 2016-05-10 | 4 | 4 | 0 |
| ITEM2 | 2016-05-04 | 1 | 0 | 0 |
| ITEM2 | 2016-05-05 | 2 | 1 | 0 |
| ITEM2 | 2016-05-06 | 3 | 2 | 0 |
| ITEM2 | 2016-05-09 | 1 | 2 | 0 |
| ITEM2 | 2016-05-10 | 1 | 2 | 1 |
+-------+------------+-------+--------+----------------------+
试试这个:
SELECT ITEM, [DATE], VALUE,
ROW_NUMBER() OVER (PARTITION BY ITEM, VALUE, grp
ORDER BY [DATE]) - 1 AS NUM_FLAT_ENTRYPOINTS
FROM (
SELECT ITEM, [DATE], VALUE,
ROW_NUMBER() OVER (PARTITION BY ITEM ORDER BY [DATE]) -
ROW_NUMBER() OVER (PARTITION BY ITEM, VALUE ORDER BY [DATE]) AS grp
FROM mytable) AS t
我有以下 table:
ITEM DATE VALUE
----------------------
ITEM1 2016-05-04 1
ITEM1 2016-05-05 3
ITEM1 2016-05-06 3
ITEM1 2016-05-09 3
ITEM1 2016-05-04 4
ITEM2 2016-05-10 1
ITEM2 2016-05-05 2
ITEM2 2016-05-06 3
ITEM2 2016-05-09 1
ITEM2 2016-05-10 1
我想知道每个项目有多少个条目在时间上是相同的(持平):
ITEM DATE VALUE NUM_FLAT_ENTRYPOINTS
------------------------------
ITEM1 2016-05-04 1 0
ITEM1 2016-05-05 3 0
ITEM1 2016-05-06 3 1
ITEM1 2016-05-09 3 2
ITEM1 2016-05-10 4 0
ITEM2 2016-05-04 1 0
ITEM2 2016-05-05 2 0
ITEM2 2016-05-06 3 0
ITEM2 2016-05-09 1 0
ITEM2 2016-05-10 1 1
我最初的想法是:
select
*,
rank()-1 over (partition by ITEM,VALUE order by DATE) as NUM_FLAT_ENTRYPOINTS
from my_table
但是,这不起作用,因为 ITEM2 会将 2016-05-04、2016-05-09 和 2016-05-10 一起划分,并在最后一行的 NUM_FLAT_ENTRYPOINTS 中显示 2 而不是 1 .
我正在使用 Microsoft SQL Server 2008。
有什么想法吗?
编辑:
在 Oracle(以及可能的其他 SQL 服务器)中,我似乎可以做到
select
count(VALUE)-1 over (partition by ITEM,VALUE order by DATE) as NUM_FLAT_ENTRYPOINTS
from my_table
但据我所知,此语法在 SQL Server 2008 中不起作用。有什么解决方法吗?
假设对我在评论中建议的示例数据进行更正,这似乎符合要求:
declare @t table (ITEM char(5), Date date, Value tinyint)
insert into @t(ITEM,DATE,VALUE) values
('ITEM1','20160504',1),
('ITEM1','20160505',3),
('ITEM1','20160506',3),
('ITEM1','20160509',3),
('ITEM1','20160510',4),
('ITEM2','20160504',1),
('ITEM2','20160505',2),
('ITEM2','20160506',3),
('ITEM2','20160509',1),
('ITEM2','20160510',1)
;With Ordered as (
select
Item,
Date,
Value,
ROW_NUMBER() OVER (PARTITION BY Item ORDER BY Date) as rn
from @t
)
select
*,
COALESCE(rn -
(select MAX(o2.rn) from Ordered o2
where o2.ITEM = o.ITEM and
o2.rn < o.rn and
o2.Value != o.Value) - 1
, o.rn - 1) as NUM_FLAT_ENTRYPOINTS
from
Ordered o
也就是我们给行号赋值(每一项单独赋值),然后我们就简单的在Value
不同的地方找到比当前行号早的最新行号。减去这些行号(以及进一步的 1)产生我们需要的答案——假设可以找到这样一个更早的行。如果没有这样的较早行,那么我们显然处于特定项目开头的序列中 - 因此我们只需从行号中减去 1。
我在这里 "obviously correct" - 可能有一种方法可以产生更好的结果,但我现在不打算这样做。
结果:
Item Date Value rn NUM_FLAT_ENTRYPOINTS
----- ---------- ----- -------------------- --------------------
ITEM1 2016-05-04 1 1 0
ITEM1 2016-05-05 3 2 0
ITEM1 2016-05-06 3 3 1
ITEM1 2016-05-09 3 4 2
ITEM1 2016-05-10 4 5 0
ITEM2 2016-05-04 1 1 0
ITEM2 2016-05-05 2 2 0
ITEM2 2016-05-06 3 3 0
ITEM2 2016-05-09 1 4 0
ITEM2 2016-05-10 1 5 1
它看起来像是间隙和岛屿的变体。
示例数据
DECLARE @T TABLE (ITEM varchar(50), dt date, VALUE int);
INSERT INTO @T(ITEM, dt, VALUE) VALUES
('ITEM1', '2016-05-04', 1),
('ITEM1', '2016-05-05', 3),
('ITEM1', '2016-05-06', 3),
('ITEM1', '2016-05-09', 3),
('ITEM1', '2016-05-10', 4),
('ITEM2', '2016-05-04', 1),
('ITEM2', '2016-05-05', 2),
('ITEM2', '2016-05-06', 3),
('ITEM2', '2016-05-09', 1),
('ITEM2', '2016-05-10', 1);
查询
WITH
CTE
AS
(
SELECT
ITEM
,dt
,VALUE
,ROW_NUMBER() OVER (PARTITION BY ITEM ORDER BY dt) AS rn1
,ROW_NUMBER() OVER (PARTITION BY ITEM, VALUE ORDER BY dt) AS rn2
FROM @T
)
SELECT
ITEM
,dt
,VALUE
,rn1-rn2 AS rnDiff
,ROW_NUMBER() OVER
(PARTITION BY ITEM, VALUE, rn1-rn2 ORDER BY dt) - 1 AS NUM_FLAT_ENTRYPOINTS
FROM CTE
ORDER BY ITEM, dt;
结果
+-------+------------+-------+--------+----------------------+
| ITEM | dt | VALUE | rnDiff | NUM_FLAT_ENTRYPOINTS |
+-------+------------+-------+--------+----------------------+
| ITEM1 | 2016-05-04 | 1 | 0 | 0 |
| ITEM1 | 2016-05-05 | 3 | 1 | 0 |
| ITEM1 | 2016-05-06 | 3 | 1 | 1 |
| ITEM1 | 2016-05-09 | 3 | 1 | 2 |
| ITEM1 | 2016-05-10 | 4 | 4 | 0 |
| ITEM2 | 2016-05-04 | 1 | 0 | 0 |
| ITEM2 | 2016-05-05 | 2 | 1 | 0 |
| ITEM2 | 2016-05-06 | 3 | 2 | 0 |
| ITEM2 | 2016-05-09 | 1 | 2 | 0 |
| ITEM2 | 2016-05-10 | 1 | 2 | 1 |
+-------+------------+-------+--------+----------------------+
试试这个:
SELECT ITEM, [DATE], VALUE,
ROW_NUMBER() OVER (PARTITION BY ITEM, VALUE, grp
ORDER BY [DATE]) - 1 AS NUM_FLAT_ENTRYPOINTS
FROM (
SELECT ITEM, [DATE], VALUE,
ROW_NUMBER() OVER (PARTITION BY ITEM ORDER BY [DATE]) -
ROW_NUMBER() OVER (PARTITION BY ITEM, VALUE ORDER BY [DATE]) AS grp
FROM mytable) AS t