按相似性对行进行分组
Grouping the rows by similarities
我正在 SQL 服务器上工作。
我有以下 table:
对于每个 BIGroup,我都有一个多个 VarianceName。
对于每个 VarianceName,我有多个 PartNumbers。
我正在将每个零件编号与同一 BIGroup 和 VarianceName 中的其他零件编号进行比较,并在 Difference:
列中写入 PartNumber1 和 PartNumber2 之间的差异数
+---------+--------------+-------------+-------------+------------+-----------+
| BIGroup | VarianceName | PartNumber1 | PartNumber2 | Difference | Cluster |
+---------+--------------+-------------+-------------+------------+-----------+
| D934 | A | 11426777 | 11426777 | 0 | |
| D934 | A | 11426777 | 11426781 | 0 | |
| D934 | A | 11426777 | 12542804 | 2 | |
| D934 | A | 11426777 | 12554759 | 4 | |
| D934 | A | 11426777 | 12564258 | 0 | |
| D934 | A | 11426781 | 11426777 | 0 | |
| D934 | A | 11426781 | 11426781 | 0 | |
| D934 | A | 11426781 | 12542804 | 5 | |
| D934 | A | 11426781 | 12554759 | 1 | |
| D934 | A | 11426781 | 12564258 | 0 | |
| D934 | A | 12542804 | 11426777 | 2 | |
| D934 | A | 12542804 | 11426781 | 5 | |
| D934 | A | 12542804 | 12542804 | 0 | |
| D934 | A | 12542804 | 12554759 | 0 | |
| D934 | A | 12542804 | 12564258 | 8 | |
| D934 | A | 12554759 | 11426777 | 4 | |
| D934 | A | 12554759 | 11426781 | 1 | |
| D934 | A | 12554759 | 12542804 | 0 | |
| D934 | A | 12554759 | 12554759 | 0 | |
| D934 | A | 12554759 | 12564258 | 9 | |
| D934 | A | 12564258 | 11426777 | 0 | |
| D934 | A | 12564258 | 11426781 | 0 | |
| D934 | A | 12564258 | 12542804 | 8 | |
| D934 | A | 12564258 | 12554759 | 9 | |
| D934 | A | 12564258 | 12564258 | 0 | |
| D934 | AA | 11438878 | 11438878 | 0 | |
| D934 | AB | 11438924 | 11438924 | 0 | |
| D934 | AC | 12556213 | 12556213 | 0 | |
| D934 | AC | 12556213 | 12556214 | 5 | |
| D934 | AC | 12556214 | 12556213 | 5 | |
| D934 | AC | 12556214 | 12556214 | 0 | |
| D955 | A | 75346846 | 75346846 | 0 | |
| ... | ... | ... | ... | 0 | |
+---------+--------------+-------------+-------------+------------+-----------+
例如:
对于 D934,对于 VarianceName A,PartNumbers 11426777、11426781 和 12564258 是相同的,因为它们之间的差异为 0:
11426777 和 11426781,
11426781 和 12564258,以及
12564258 和 11426777.
例如:
对于 D934,对于 VarianceName A,PartNumbers 12542804 和 12554759 是相同的,因为有 0
之间的差异:
12542804 和 12554759.
我的目标是识别同一 BIGroup 和 VarianceName 中的所有相同 PartNumbers 组。
为了标记这些组,我将使用名为 Cluster.
的列
所以 11426777、11426781 和 12564258 属于集群 D934-A-C1。
所以 12542804 和 12554759 属于集群 D934-A-C2。
更新 Cluster 列的query/stored程序应该是什么,以获得以下结果:
+---------+--------------+-------------+-------------+------------+-----------+
| BIGroup | VarianceName | PartNumber1 | PartNumber2 | Difference | Cluster |
+---------+--------------+-------------+-------------+------------+-----------+
| D934 | A | 11426777 | 11426777 | 0 | D934-A-C1 |
| D934 | A | 11426777 | 11426781 | 0 | D934-A-C1 |
| D934 | A | 11426777 | 12542804 | 2 | |
| D934 | A | 11426777 | 12554759 | 4 | |
| D934 | A | 11426777 | 12564258 | 0 | D934-A-C1 |
| D934 | A | 11426781 | 11426777 | 0 | D934-A-C1 |
| D934 | A | 11426781 | 11426781 | 0 | D934-A-C1 |
| D934 | A | 11426781 | 12542804 | 5 | |
| D934 | A | 11426781 | 12554759 | 1 | |
| D934 | A | 11426781 | 12564258 | 0 | D934-A-C1 |
| D934 | A | 12542804 | 11426777 | 2 | |
| D934 | A | 12542804 | 11426781 | 5 | |
| D934 | A | 12542804 | 12542804 | 0 | D934-A-C2 |
| D934 | A | 12542804 | 12554759 | 0 | D934-A-C2 |
| D934 | A | 12542804 | 12564258 | 8 | |
| D934 | A | 12554759 | 11426777 | 4 | |
| D934 | A | 12554759 | 11426781 | 1 | |
| D934 | A | 12554759 | 12542804 | 0 | D934-A-C2 |
| D934 | A | 12554759 | 12554759 | 0 | D934-A-C2 |
| D934 | A | 12554759 | 12564258 | 9 | |
| D934 | A | 12564258 | 11426777 | 0 | D934-A-C1 |
| D934 | A | 12564258 | 11426781 | 0 | D934-A-C1 |
| D934 | A | 12564258 | 12542804 | 8 | |
| D934 | A | 12564258 | 12554759 | 9 | |
| D934 | A | 12564258 | 12564258 | 0 | D934-A-C1 |
其他方差名称以此类推
| D934 | AA | 11438878 | 11438878 | 0 | D934-AA-C1
| D934 | AB | 11438924 | 11438924 | 0 | D934-AB-C1
| D934 | AC | 12556213 | 12556213 | 0 | D934-AC-C1
| D934 | AC | 12556213 | 12556214 | 5 |
| D934 | AC | 12556214 | 12556213 | 5 |
| D934 | AC | 12556214 | 12556214 | 0 | D934-AC-C1
其他 BiGroup 依此类推
| D955 | A | 75346846 | 75346846 | 0 | D955-A-C1
| ... | ... | ... | ... | ... |
+---------+--------------+-------------+-------------+------------+-----------+
如果 Difference > 0
,则该列应保留为 NULL
这是将数据作为 cte 的脚本:
with t1 as
(
select 'D934' as BIGroup ,'A' as VarianceName , 11426777 as PartNumber1, 11426777 as PartNumber2, 0 as Difference, null as Cluster
union select 'D934' ,'A' , 11426777 , 11426781 , 0 , null
union select 'D934' ,'A' , 11426777 , 12542804 , 2 , null
union select 'D934' ,'A' , 11426777 , 12554759 , 4 , null
union select 'D934' ,'A' , 11426777 , 12564258 , 0 , null
union select 'D934' ,'A' , 11426781 , 11426777 , 0 , null
union select 'D934' ,'A' , 11426781 , 11426781 , 0 , null
union select 'D934' ,'A' , 11426781 , 12542804 , 5 , null
union select 'D934' ,'A' , 11426781 , 12554759 , 1 , null
union select 'D934' ,'A' , 11426781 , 12564258 , 0 , null
union select 'D934' ,'A' , 12542804 , 11426777 , 2 , null
union select 'D934' ,'A' , 12542804 , 11426781 , 5 , null
union select 'D934' ,'A' , 12542804 , 12542804 , 0 , null
union select 'D934' ,'A' , 12542804 , 12554759 , 0 , null
union select 'D934' ,'A' , 12542804 , 12564258 , 8 , null
union select 'D934' ,'A' , 12554759 , 11426777 , 4 , null
union select 'D934' ,'A' , 12554759 , 11426781 , 1 , null
union select 'D934' ,'A' , 12554759 , 12542804 , 0 , null
union select 'D934' ,'A' , 12554759 , 12554759 , 0 , null
union select 'D934' ,'A' , 12554759 , 12564258 , 9 , null
union select 'D934' ,'A' , 12564258 , 11426777 , 0 , null
union select 'D934' ,'A' , 12564258 , 11426781 , 0 , null
union select 'D934' ,'A' , 12564258 , 12542804 , 8 , null
union select 'D934' ,'A' , 12564258 , 12554759 , 9 , null
union select 'D934' ,'A' , 12564258 , 12564258 , 0 , null
union select 'D934' ,'AA' , 11438878 , 11438878 , 0 , null
union select 'D934' ,'AB' , 11438924 , 11438924 , 0 , null
union select 'D934' ,'AC' , 12556213 , 12556213 , 0 , null
union select 'D934' ,'AC' , 12556213 , 12556214 , 5 , null
union select 'D934' ,'AC' , 12556214 , 12556213 , 5 , null
union select 'D934' ,'AC' , 12556214 , 12556214 , 0 , null
union select 'D955' ,'A' , 75346846 , 75346846 , 0 , null
)
编辑:
为了更好地理解这个问题,我画了 D934
A
的 5 个 partnumbers,它们的链接,以及两个簇。
我们感兴趣的链接是黑色链接(因为这意味着 partNumbers 之间的差异为 0)。
橙色链接表示零件号之间的差异>0。
绘制链接后我们可以识别出 2 个集群,我用红色圆圈绘制了这些集群。
您可以使用 DENSE_RANK 为每个集群生成一个数字。
然后将该排名连接到 BIGroup & VarianceName 时,您将获得一个集群代码。
困难在于找到这些集群之间的共同点。
下面的查询使用了一个技巧,它计算 PartNumber2 的最小值和 0 差值的总和。
以及 DENSE_RANK 的用途。
;WITH CTE1 AS
(
SELECT *
, P2Min0 = MIN(CASE WHEN Difference = 0 THEN PartNumber2 END)
OVER (PARTITION BY BIGroup, VarianceName, PartNumber1)
, P2Sum0 = SUM(CASE WHEN Difference = 0 THEN PartNumber2 END)
OVER (PARTITION BY BIGroup, VarianceName, PartNumber1)
FROM t1
)
, CTE2 AS
(
SELECT *
, Rnk = DENSE_RANK()
OVER (PARTITION BY BIGroup, VarianceName ORDER BY P2Min0, P2Sum0)
FROM CTE1
WHERE Difference = 0
)
UPDATE CTE2
SET Cluster = CONCAT(BIGroup, '-', VarianceName, '-', Rnk)
对 db<>fiddle here
的测试
试试这个:
;WITH cte_p(BIGroup, VarianceName, PartNumber1, PartNumber2)
AS
(
SELECT BIGroup, VarianceName, PartNumber1, PartNumber2
FROM t1
WHERE [Difference]=0
),
cte_c(BIGroup, VarianceName, PartNumber1, PartNumber2, COrder)
AS
(
SELECT p1.BIGroup, p1.VarianceName, p1.PartNumber1, p1.PartNumber2,
DENSE_RANK() OVER (PARTITION BY p1.BIGroup, p1.VarianceName ORDER BY p1.PartNumber1) AS COrder
FROM cte_p p1
WHERE NOT EXISTS(SELECT 1 FROM cte_p p2
WHERE p2.PartNumber1<>p2.PartNumber2
AND p1.BIGroup=p2.BIGroup
AND p1.VarianceName=p2.VarianceName
AND p1.PartNumber1=p2.PartNumber2)
)
SELECT t.*,t.BIGroup+'-'+t.VarianceName+'-C'+CAST(c.COrder AS nvarchar(20))
FROM t1 t
INNER JOIN cte_c c
ON t.BIGroup=c.BIGroup
AND t.VarianceName=c.VarianceName
AND t.PartNumber1=c.PartNumber1;
我设法用存储过程解决了这个问题:
DECLARE @BiGroup [nvarchar](30);
DECLARE @VarianceName [nvarchar](30);
DECLARE @NewBiGroup [nvarchar](30);
DECLARE @NewVarianceName [nvarchar](30);
DECLARE @PartNumber [nvarchar](30);
DECLARE @ClusterName [nvarchar](30);
DECLARE @IncrementClusterName [nvarchar](30);
set @BiGroup = 'first_BiGroup';
set @VarianceName = 'first_VarianceName';
set @IncrementClusterName = 1;
set @ClusterName = null;
-- Declare cursor
DECLARE cur CURSOR READ_ONLY FOR
Select [PartNumber1] FROM t1
order by [BIGroup] ,[VarianceName] ,[PartNumber1];
--clean cluster column
update t1 set [Cluster]=null;
OPEN cur
FETCH NEXT FROM cur INTO @PartNumber
-- Loop on every PartNumber
WHILE @@FETCH_STATUS = 0
BEGIN
--set NewBiGroup and NewPartNumber
set @NewBiGroup = (select Top(1) [BIGroup] from t1 where partnumber1 = @PartNumber);
set @NewVarianceName = (select Top(1) [VarianceName] from t1 where partnumber1 = @PartNumber);
--check if we are still in the same BIGroup and Variance, otherwise, reset the cluster increment
if @NewBiGroup <> @BiGroup or @NewVarianceName <> @VarianceName
BEGIN
set @IncrementClusterName = 1;
END
--get the clusterName of this partNumber, if it exists
set @ClusterName = (select Top(1) [Cluster] from t1 where partnumber2 = @PartNumber and [Cluster] is not null);
--if ClusterName is NULL, put a clustername and then increment the @IncrementClusterName,
--otherwise set the cluster to @ClusterName
if @ClusterName is null
BEGIN
update t1 set [Cluster] = @NewBiGroup+'-'+@NewVarianceName+'-'+@IncrementClusterName
where partnumber1 = @PartNumber
and Difference= 0 ;
set @IncrementClusterName = @IncrementClusterName +1;
END
else
BEGIN
update t1 set [Cluster] = @NewBiGroup+'-'+@NewVarianceName+'-'+@ClusterName
where partnumber1 = @PartNumber
and Difference= 0 ;
END
-- setting the BiGroup and VarianceName
set @BiGroup = @NewBiGroup;
set @VarianceName = @NewVarianceName;
FETCH NEXT FROM cur INTO @PartNumber
END
CLOSE cur
DEALLOCATE cur
存储过程的算法是这样的:
- 对于每个
PartNumber
,差值=0
- 如果
BiGroup
或 VarianceName
已更改
- 我将
@clusterIncrement
重置为 1
- 如果他还没有加入
Cluster
- 我把他的
Cluster
设为@clusterIncrement
@clusterIncrement = @clusterIncrement +1
- 如果他已经是
Cluster
的一员
- 我把他的
Cluster
设置为现有的Cluster
我正在 SQL 服务器上工作。
我有以下 table:
对于每个 BIGroup,我都有一个多个 VarianceName。 对于每个 VarianceName,我有多个 PartNumbers。 我正在将每个零件编号与同一 BIGroup 和 VarianceName 中的其他零件编号进行比较,并在 Difference:
列中写入 PartNumber1 和 PartNumber2 之间的差异数+---------+--------------+-------------+-------------+------------+-----------+
| BIGroup | VarianceName | PartNumber1 | PartNumber2 | Difference | Cluster |
+---------+--------------+-------------+-------------+------------+-----------+
| D934 | A | 11426777 | 11426777 | 0 | |
| D934 | A | 11426777 | 11426781 | 0 | |
| D934 | A | 11426777 | 12542804 | 2 | |
| D934 | A | 11426777 | 12554759 | 4 | |
| D934 | A | 11426777 | 12564258 | 0 | |
| D934 | A | 11426781 | 11426777 | 0 | |
| D934 | A | 11426781 | 11426781 | 0 | |
| D934 | A | 11426781 | 12542804 | 5 | |
| D934 | A | 11426781 | 12554759 | 1 | |
| D934 | A | 11426781 | 12564258 | 0 | |
| D934 | A | 12542804 | 11426777 | 2 | |
| D934 | A | 12542804 | 11426781 | 5 | |
| D934 | A | 12542804 | 12542804 | 0 | |
| D934 | A | 12542804 | 12554759 | 0 | |
| D934 | A | 12542804 | 12564258 | 8 | |
| D934 | A | 12554759 | 11426777 | 4 | |
| D934 | A | 12554759 | 11426781 | 1 | |
| D934 | A | 12554759 | 12542804 | 0 | |
| D934 | A | 12554759 | 12554759 | 0 | |
| D934 | A | 12554759 | 12564258 | 9 | |
| D934 | A | 12564258 | 11426777 | 0 | |
| D934 | A | 12564258 | 11426781 | 0 | |
| D934 | A | 12564258 | 12542804 | 8 | |
| D934 | A | 12564258 | 12554759 | 9 | |
| D934 | A | 12564258 | 12564258 | 0 | |
| D934 | AA | 11438878 | 11438878 | 0 | |
| D934 | AB | 11438924 | 11438924 | 0 | |
| D934 | AC | 12556213 | 12556213 | 0 | |
| D934 | AC | 12556213 | 12556214 | 5 | |
| D934 | AC | 12556214 | 12556213 | 5 | |
| D934 | AC | 12556214 | 12556214 | 0 | |
| D955 | A | 75346846 | 75346846 | 0 | |
| ... | ... | ... | ... | 0 | |
+---------+--------------+-------------+-------------+------------+-----------+
例如: 对于 D934,对于 VarianceName A,PartNumbers 11426777、11426781 和 12564258 是相同的,因为它们之间的差异为 0: 11426777 和 11426781, 11426781 和 12564258,以及 12564258 和 11426777.
例如: 对于 D934,对于 VarianceName A,PartNumbers 12542804 和 12554759 是相同的,因为有 0 之间的差异: 12542804 和 12554759.
我的目标是识别同一 BIGroup 和 VarianceName 中的所有相同 PartNumbers 组。 为了标记这些组,我将使用名为 Cluster.
的列所以 11426777、11426781 和 12564258 属于集群 D934-A-C1。
所以 12542804 和 12554759 属于集群 D934-A-C2。
更新 Cluster 列的query/stored程序应该是什么,以获得以下结果:
+---------+--------------+-------------+-------------+------------+-----------+
| BIGroup | VarianceName | PartNumber1 | PartNumber2 | Difference | Cluster |
+---------+--------------+-------------+-------------+------------+-----------+
| D934 | A | 11426777 | 11426777 | 0 | D934-A-C1 |
| D934 | A | 11426777 | 11426781 | 0 | D934-A-C1 |
| D934 | A | 11426777 | 12542804 | 2 | |
| D934 | A | 11426777 | 12554759 | 4 | |
| D934 | A | 11426777 | 12564258 | 0 | D934-A-C1 |
| D934 | A | 11426781 | 11426777 | 0 | D934-A-C1 |
| D934 | A | 11426781 | 11426781 | 0 | D934-A-C1 |
| D934 | A | 11426781 | 12542804 | 5 | |
| D934 | A | 11426781 | 12554759 | 1 | |
| D934 | A | 11426781 | 12564258 | 0 | D934-A-C1 |
| D934 | A | 12542804 | 11426777 | 2 | |
| D934 | A | 12542804 | 11426781 | 5 | |
| D934 | A | 12542804 | 12542804 | 0 | D934-A-C2 |
| D934 | A | 12542804 | 12554759 | 0 | D934-A-C2 |
| D934 | A | 12542804 | 12564258 | 8 | |
| D934 | A | 12554759 | 11426777 | 4 | |
| D934 | A | 12554759 | 11426781 | 1 | |
| D934 | A | 12554759 | 12542804 | 0 | D934-A-C2 |
| D934 | A | 12554759 | 12554759 | 0 | D934-A-C2 |
| D934 | A | 12554759 | 12564258 | 9 | |
| D934 | A | 12564258 | 11426777 | 0 | D934-A-C1 |
| D934 | A | 12564258 | 11426781 | 0 | D934-A-C1 |
| D934 | A | 12564258 | 12542804 | 8 | |
| D934 | A | 12564258 | 12554759 | 9 | |
| D934 | A | 12564258 | 12564258 | 0 | D934-A-C1 |
其他方差名称以此类推
| D934 | AA | 11438878 | 11438878 | 0 | D934-AA-C1
| D934 | AB | 11438924 | 11438924 | 0 | D934-AB-C1
| D934 | AC | 12556213 | 12556213 | 0 | D934-AC-C1
| D934 | AC | 12556213 | 12556214 | 5 |
| D934 | AC | 12556214 | 12556213 | 5 |
| D934 | AC | 12556214 | 12556214 | 0 | D934-AC-C1
其他 BiGroup 依此类推
| D955 | A | 75346846 | 75346846 | 0 | D955-A-C1
| ... | ... | ... | ... | ... |
+---------+--------------+-------------+-------------+------------+-----------+
如果 Difference > 0
,则该列应保留为 NULL这是将数据作为 cte 的脚本:
with t1 as
(
select 'D934' as BIGroup ,'A' as VarianceName , 11426777 as PartNumber1, 11426777 as PartNumber2, 0 as Difference, null as Cluster
union select 'D934' ,'A' , 11426777 , 11426781 , 0 , null
union select 'D934' ,'A' , 11426777 , 12542804 , 2 , null
union select 'D934' ,'A' , 11426777 , 12554759 , 4 , null
union select 'D934' ,'A' , 11426777 , 12564258 , 0 , null
union select 'D934' ,'A' , 11426781 , 11426777 , 0 , null
union select 'D934' ,'A' , 11426781 , 11426781 , 0 , null
union select 'D934' ,'A' , 11426781 , 12542804 , 5 , null
union select 'D934' ,'A' , 11426781 , 12554759 , 1 , null
union select 'D934' ,'A' , 11426781 , 12564258 , 0 , null
union select 'D934' ,'A' , 12542804 , 11426777 , 2 , null
union select 'D934' ,'A' , 12542804 , 11426781 , 5 , null
union select 'D934' ,'A' , 12542804 , 12542804 , 0 , null
union select 'D934' ,'A' , 12542804 , 12554759 , 0 , null
union select 'D934' ,'A' , 12542804 , 12564258 , 8 , null
union select 'D934' ,'A' , 12554759 , 11426777 , 4 , null
union select 'D934' ,'A' , 12554759 , 11426781 , 1 , null
union select 'D934' ,'A' , 12554759 , 12542804 , 0 , null
union select 'D934' ,'A' , 12554759 , 12554759 , 0 , null
union select 'D934' ,'A' , 12554759 , 12564258 , 9 , null
union select 'D934' ,'A' , 12564258 , 11426777 , 0 , null
union select 'D934' ,'A' , 12564258 , 11426781 , 0 , null
union select 'D934' ,'A' , 12564258 , 12542804 , 8 , null
union select 'D934' ,'A' , 12564258 , 12554759 , 9 , null
union select 'D934' ,'A' , 12564258 , 12564258 , 0 , null
union select 'D934' ,'AA' , 11438878 , 11438878 , 0 , null
union select 'D934' ,'AB' , 11438924 , 11438924 , 0 , null
union select 'D934' ,'AC' , 12556213 , 12556213 , 0 , null
union select 'D934' ,'AC' , 12556213 , 12556214 , 5 , null
union select 'D934' ,'AC' , 12556214 , 12556213 , 5 , null
union select 'D934' ,'AC' , 12556214 , 12556214 , 0 , null
union select 'D955' ,'A' , 75346846 , 75346846 , 0 , null
)
编辑:
为了更好地理解这个问题,我画了 D934
A
的 5 个 partnumbers,它们的链接,以及两个簇。
我们感兴趣的链接是黑色链接(因为这意味着 partNumbers 之间的差异为 0)。
橙色链接表示零件号之间的差异>0。
绘制链接后我们可以识别出 2 个集群,我用红色圆圈绘制了这些集群。
您可以使用 DENSE_RANK 为每个集群生成一个数字。
然后将该排名连接到 BIGroup & VarianceName 时,您将获得一个集群代码。
困难在于找到这些集群之间的共同点。
下面的查询使用了一个技巧,它计算 PartNumber2 的最小值和 0 差值的总和。
以及 DENSE_RANK 的用途。
;WITH CTE1 AS
(
SELECT *
, P2Min0 = MIN(CASE WHEN Difference = 0 THEN PartNumber2 END)
OVER (PARTITION BY BIGroup, VarianceName, PartNumber1)
, P2Sum0 = SUM(CASE WHEN Difference = 0 THEN PartNumber2 END)
OVER (PARTITION BY BIGroup, VarianceName, PartNumber1)
FROM t1
)
, CTE2 AS
(
SELECT *
, Rnk = DENSE_RANK()
OVER (PARTITION BY BIGroup, VarianceName ORDER BY P2Min0, P2Sum0)
FROM CTE1
WHERE Difference = 0
)
UPDATE CTE2
SET Cluster = CONCAT(BIGroup, '-', VarianceName, '-', Rnk)
对 db<>fiddle here
的测试试试这个:
;WITH cte_p(BIGroup, VarianceName, PartNumber1, PartNumber2)
AS
(
SELECT BIGroup, VarianceName, PartNumber1, PartNumber2
FROM t1
WHERE [Difference]=0
),
cte_c(BIGroup, VarianceName, PartNumber1, PartNumber2, COrder)
AS
(
SELECT p1.BIGroup, p1.VarianceName, p1.PartNumber1, p1.PartNumber2,
DENSE_RANK() OVER (PARTITION BY p1.BIGroup, p1.VarianceName ORDER BY p1.PartNumber1) AS COrder
FROM cte_p p1
WHERE NOT EXISTS(SELECT 1 FROM cte_p p2
WHERE p2.PartNumber1<>p2.PartNumber2
AND p1.BIGroup=p2.BIGroup
AND p1.VarianceName=p2.VarianceName
AND p1.PartNumber1=p2.PartNumber2)
)
SELECT t.*,t.BIGroup+'-'+t.VarianceName+'-C'+CAST(c.COrder AS nvarchar(20))
FROM t1 t
INNER JOIN cte_c c
ON t.BIGroup=c.BIGroup
AND t.VarianceName=c.VarianceName
AND t.PartNumber1=c.PartNumber1;
我设法用存储过程解决了这个问题:
DECLARE @BiGroup [nvarchar](30);
DECLARE @VarianceName [nvarchar](30);
DECLARE @NewBiGroup [nvarchar](30);
DECLARE @NewVarianceName [nvarchar](30);
DECLARE @PartNumber [nvarchar](30);
DECLARE @ClusterName [nvarchar](30);
DECLARE @IncrementClusterName [nvarchar](30);
set @BiGroup = 'first_BiGroup';
set @VarianceName = 'first_VarianceName';
set @IncrementClusterName = 1;
set @ClusterName = null;
-- Declare cursor
DECLARE cur CURSOR READ_ONLY FOR
Select [PartNumber1] FROM t1
order by [BIGroup] ,[VarianceName] ,[PartNumber1];
--clean cluster column
update t1 set [Cluster]=null;
OPEN cur
FETCH NEXT FROM cur INTO @PartNumber
-- Loop on every PartNumber
WHILE @@FETCH_STATUS = 0
BEGIN
--set NewBiGroup and NewPartNumber
set @NewBiGroup = (select Top(1) [BIGroup] from t1 where partnumber1 = @PartNumber);
set @NewVarianceName = (select Top(1) [VarianceName] from t1 where partnumber1 = @PartNumber);
--check if we are still in the same BIGroup and Variance, otherwise, reset the cluster increment
if @NewBiGroup <> @BiGroup or @NewVarianceName <> @VarianceName
BEGIN
set @IncrementClusterName = 1;
END
--get the clusterName of this partNumber, if it exists
set @ClusterName = (select Top(1) [Cluster] from t1 where partnumber2 = @PartNumber and [Cluster] is not null);
--if ClusterName is NULL, put a clustername and then increment the @IncrementClusterName,
--otherwise set the cluster to @ClusterName
if @ClusterName is null
BEGIN
update t1 set [Cluster] = @NewBiGroup+'-'+@NewVarianceName+'-'+@IncrementClusterName
where partnumber1 = @PartNumber
and Difference= 0 ;
set @IncrementClusterName = @IncrementClusterName +1;
END
else
BEGIN
update t1 set [Cluster] = @NewBiGroup+'-'+@NewVarianceName+'-'+@ClusterName
where partnumber1 = @PartNumber
and Difference= 0 ;
END
-- setting the BiGroup and VarianceName
set @BiGroup = @NewBiGroup;
set @VarianceName = @NewVarianceName;
FETCH NEXT FROM cur INTO @PartNumber
END
CLOSE cur
DEALLOCATE cur
存储过程的算法是这样的:
- 对于每个
PartNumber
,差值=0- 如果
BiGroup
或VarianceName
已更改- 我将
@clusterIncrement
重置为 1
- 我将
- 如果他还没有加入
Cluster
- 我把他的
Cluster
设为@clusterIncrement
@clusterIncrement = @clusterIncrement +1
- 我把他的
- 如果他已经是
Cluster
的一员- 我把他的
Cluster
设置为现有的Cluster
- 我把他的
- 如果