正确的 Set 操作是在一组集合中找到一个匹配的集合,还是 full join?
Proper Set operation to find a matching set in a set of sets, or full join?
TLDR
如何将一组集合与单个集合进行匹配并将其绑定到相应的行?
给定一行,该行具有链接摘要 table 和 key/value 对描述该行的属性,以及一堆描述如何从中总结内容的搜索描述(目标)行,我如何根据属性-table 与搜索描述中的 key/value 对的匹配,找到与给定行匹配的搜索描述?
简化示例:
CREATE TABLE TargetKeyValue(TargetId INT, TargetKey NVARCHAR(50), TargetValue NVARCHAR(50))
CREATE TABLE OriginalRows(Id INT, Cost DECIMAL, BunchOfOtherCols NVARCHAR(500),
CONSTRAINT [PK_Id] PRIMARY KEY CLUSTERED ([Id] ASC))
CREATE TABLE RowKeyValue(RowId INT, KeyPart NVARCHAR(50), ValuePart NVARCHAR(50),
CONSTRAINT [FK_RowId_Id] FOREIGN KEY (RowId) REFERENCES OriginalRows(Id))
INSERT INTO OriginalRows VALUES
(1, 55.5, 'Some cool red coat'),
(2, 80.0, 'Some cool green coat XL'),
(3, 250.00, 'Some cool green coat L'),
(4, 100.0, 'Some whiskey'),
(5, 42.0, 'This is not a match')
INSERT INTO RowKeyValue VALUES
(1, 'Color', 'Red'),
(1, 'Size', 'XL'),
(1, 'Kind', 'Coat'),
(2, 'Color', 'Green'),
(2, 'Size', 'XL'),
(2, 'Kind', 'Coat'),
(3, 'Color', 'Green'),
(3, 'Size', 'L'),
(3, 'Kind', 'Coat'),
(4, 'Color', 'Green'),
(4, 'Size', 'Medium'),
(4, 'Kind', 'Whiskey')
INSERT INTO TargetKeyValue VALUES
(55, 'Color', 'Red'),
(56, 'Color', 'Green'),
(56, 'Size', 'XL'),
(57, 'Kind', 'Coat'),
(58, 'Color', 'Green'),
(58, 'Size', 'Medium'),
(58, 'Kind', 'Whiskey')
这给出以下 tables:
-- table OriginalRows
Id Cost BunchOfOtherCols
1 56 Some cool red coat
2 80 Some cool green coat XL
3 250 Some cool green coat L
4 100 Some whiskey
5 42 This is not a match
-- table RowKeyValue
RowId KeyPart ValuePart
1 Color Red
1 Size XL
1 Kind Coat
2 Color Green
2 Size XL
2 Kind Coat
3 Color Green
3 Size L
3 Kind Coat
4 Color Green
4 Size Medium
4 Kind Whiskey
-- table TargetKeyValue
TargetId TargetKey TargetValue
55 Color Red
56 Color Green
56 Size XL
57 Kind Coat
58 Color Green
58 Size Medium
58 Kind Whiskey
预期结果
下面的函数将给出正确的结果:
Id Cost BunchOfOtherCols IsTargetMatch TargetKeyId
1 56 Some cool red coat 1 55
2 80 Some cool green coat XL 1 56
3 250 Some cool green coat L 1 57
4 100 Some whiskey 1 58
5 42 This is not a match 0 NULL
换句话说:
- 将原始行 ID 绑定到它首先匹配的目标 ID(如果这样更容易,我可以接受多次返回的连接)
- 不匹配时显示原始行
- 如果属于一个目标 ID 的一组与给定原始行的相同值匹配,则匹配为真
当前使用游标的方法...唉
下面的代码使用了游标,但这被证明很慢(这是可以理解的,因为它基本上只是一次又一次的非索引 table 扫描)。
我尝试的另一种方法是使用 XML PATH 查询,但结果证明这是行不通的(很简单,但也太慢了)。
我知道这在关系数据库中是一项非常重要的任务,但我希望仍然有一个相当简单的解决方案。我下面的东西有点管用,我可能只是使用批处理来存储结果或其他东西,除非有更好的方法来使用 SET
操作或,idunno,FULL JOIN
?
可以在视图中使用的任何解决方案(即不涉及动态 SQL 或调用 SP)都可以。我们曾经有基于 SP 的解决方案,但由于需要在 PowerBI 和其他系统中分析数据,SQL 视图和确定性是必经之路。
这是我所追求的一个完全有效的最小示例。该功能是我正在寻找的部分,用更少的程序和更多的功能来代替,即基于集合的方法:
CREATE TABLE TargetKeyValue(TargetId INT, TargetKey NVARCHAR(50), TargetValue NVARCHAR(50))
CREATE TABLE OriginalRows(Id INT, Cost DECIMAL, BunchOfOtherCols NVARCHAR(500),
CONSTRAINT [PK_Id] PRIMARY KEY CLUSTERED ([Id] ASC))
CREATE TABLE RowKeyValue(RowId INT, KeyPart NVARCHAR(50), ValuePart NVARCHAR(50),
CONSTRAINT [FK_RowId_Id] FOREIGN KEY (RowId) REFERENCES OriginalRows(Id))
INSERT INTO OriginalRows VALUES
(1, 55.5, 'Some cool red coat'),
(2, 80.0, 'Some cool green coat XL'),
(3, 250.00, 'Some cool green coat L'),
(4, 100.0, 'Some whiskey'),
(5, 42.0, 'This is not a match')
INSERT INTO RowKeyValue VALUES
(1, 'Color', 'Red'),
(1, 'Size', 'XL'),
(1, 'Kind', 'Coat'),
(2, 'Color', 'Green'),
(2, 'Size', 'XL'),
(2, 'Kind', 'Coat'),
(3, 'Color', 'Green'),
(3, 'Size', 'L'),
(3, 'Kind', 'Coat'),
(4, 'Color', 'Green'),
(4, 'Size', 'Medium'),
(4, 'Kind', 'Whiskey')
INSERT INTO TargetKeyValue VALUES
(55, 'Color', 'Red'),
(56, 'Color', 'Green'),
(56, 'Size', 'XL'),
(57, 'Kind', 'Coat'),
(58, 'Color', 'Green'),
(58, 'Size', 'Medium'),
(58, 'Kind', 'Whiskey')
GO
CREATE FUNCTION dbo.MatchTargetAgainstKeysFromRow
(
@rowid INT
)
RETURNS @MatchResults TABLE(
IsTargetMatch BIT,
TargetKeyId INT)
AS
BEGIN
--
-- METHOD (1) (faster, by materializing the xml field into a cross-over lookup table)
--
-- single row from activities as key/value pairs multi-row
DECLARE @rowAsKeyValue AS TABLE(KeyPart NVARCHAR(1000), ValuePart NVARCHAR(MAX))
INSERT INTO @rowAsKeyValue (KeyPart, ValuePart)
SELECT KeyPart, ValuePart FROM RowKeyValue WHERE RowId = @rowid
DECLARE @LookupColumn NVARCHAR(100)
DECLARE @LookupValue NVARCHAR(max)
DECLARE @TargetId INT
DECLARE @CurrentTargetId INT
DECLARE @IsMatch INT
DECLARE key_Cursor CURSOR
LOCAL STATIC FORWARD_ONLY READ_ONLY
FOR SELECT TargetKey, TargetValue, TargetId FROM TargetKeyValue ORDER BY TargetId
OPEN key_Cursor
FETCH NEXT FROM key_Cursor INTO @LookupColumn, @LookupValue, @TargetId
WHILE @@FETCH_STATUS = 0
BEGIN
SET @IsMatch = (SELECT COUNT(*) FROM @rowAsKeyValue WHERE KeyPart = @LookupColumn AND ValuePart = @LookupValue)
IF(@IsMatch = 0)
BEGIN
-- move to next key that isn't the current key
SET @CurrentTargetId = @TargetId
WHILE @@FETCH_STATUS = 0 AND @CurrentTargetId = @TargetId
BEGIN
FETCH NEXT FROM key_Cursor INTO @LookupColumn, @LookupValue, @TargetId
END
END
ELSE
BEGIN
SET @CurrentTargetId = @TargetId
WHILE @@FETCH_STATUS = 0 AND @IsMatch > 0 AND @CurrentTargetId = @TargetId
BEGIN
FETCH NEXT FROM key_Cursor INTO @LookupColumn, @LookupValue, @TargetId
IF @CurrentTargetId = @TargetId
SET @IsMatch = (SELECT COUNT(*) FROM @rowAsKeyValue WHERE KeyPart = @LookupColumn AND ValuePart = @LookupValue)
END
IF @IsMatch > 0
BEGIN
-- we found a positive matching key, nothing more to do
BREAK
END
END
END
DEALLOCATE key_Cursor -- deallocating a cursor also closes it
INSERT @MatchResults
SELECT
(CASE WHEN (SELECT COUNT(*) FROM @rowAsKeyValue) > 0 THEN 1 ELSE 0 END),
(CASE WHEN @IsMatch > 0 THEN @CurrentTargetId ELSE NULL END)
RETURN
END
GO
-- function in action
select * from OriginalRows
cross apply dbo.MatchTargetAgainstKeysFromRow(Id) fn
-- cleanup
drop function dbo.MatchTargetAgainstKeysFromRow
drop table TargetKeyValue
drop table RowKeyValue
drop table OriginalRows
这道题是Relational Division With Remainder的一个例子,有多个被除数
关系划分基本上与连接相反:在这种情况下,我们想知道哪个 OriginalRows
匹配哪个 TargetIds
,基于每个 key/value 对 TargetId
匹配 OriginalRows
.
的 key/value 对
有很多方法可以做到这一点,这里有一些:
SELECT
r.Id,
r.Cost,
r.BunchOfOtherCols,
t.TargetId
FROM OriginalRows r
OUTER APPLY (
SELECT ttKV.TargetId
FROM TargetKeyValue tKV
LEFT JOIN RowKeyValue rKV
ON rKV.KeyPart = tKV.TargetKey AND rKV.ValuePart = tKV.TargetValue
AND rKV.RowId = r.Id
GROUP BY tKV.TargetId
HAVING COUNT(*) = COUNT(rKV.RowId) -- all target k/vs have match
) t;
SELECT
r.Id,
r.Cost,
r.BunchOfOtherCols,
tKV.TargetId
FROM OriginalRows r
CROSS JOIN TargetKeyValue tKV
LEFT JOIN RowKeyValue rKV
ON rKV.KeyPart = tKV.TargetKey AND rKV.ValuePart = tKV.TargetValue
AND rKV.RowId = r.Id
GROUP BY
r.Id,
r.Cost,
r.BunchOfOtherCols,
tKV.TargetId
HAVING COUNT(*) = COUNT(rKV.RowId) -- all target k/vs have match
SELECT
r.Id,
r.Cost,
r.BunchOfOtherCols,
tKV.TargetId
FROM OriginalRows r
CROSS JOIN TargetKeyValue tKV
CROSS APPLY (VALUES (CASE WHEN EXISTS (SELECT 1
FROM RowKeyValue rKV
WHERE rKV.KeyPart = tKV.TargetKey AND rKV.ValuePart = tKV.TargetValue
AND rKV.RowId = r.Id
) THEN 1 END
) ) rKV(IsMatch)
GROUP BY
r.Id,
r.Cost,
r.BunchOfOtherCols,
tKV.TargetId
HAVING COUNT(*) = COUNT(rKV.IsMatch) -- all target k/vs have match
您也可以将
替换为 HAVING COUNT(*) = COUNT(rKV.RowId)
HAVING COUNT(CASE WHEN rKV.RowId IS NULL THEN 1 END) = 0 -- all target k/vs have match
如果你想要单个函数 OriginalRows
,那就更简单了:
CREATE FUNCTION dbo.MatchTargetAgainstKeysFromRow
(
@rowid INT
)
RETURNS TABLE
AS RETURN (
SELECT
tKV.TargetId
FROM TargetKeyValue tKV
LEFT JOIN RowKeyValue rKV
ON rKV.KeyPart = tKV.TargetKey AND rKV.ValuePart = tKV.TargetValue
AND rKV.RowId = @rowid
GROUP BY
tKV.TargetId
HAVING COUNT(*) = COUNT(rKV.RowId) -- all target k/vs have match
);
GO
TLDR
如何将一组集合与单个集合进行匹配并将其绑定到相应的行?
给定一行,该行具有链接摘要 table 和 key/value 对描述该行的属性,以及一堆描述如何从中总结内容的搜索描述(目标)行,我如何根据属性-table 与搜索描述中的 key/value 对的匹配,找到与给定行匹配的搜索描述?
简化示例:
CREATE TABLE TargetKeyValue(TargetId INT, TargetKey NVARCHAR(50), TargetValue NVARCHAR(50))
CREATE TABLE OriginalRows(Id INT, Cost DECIMAL, BunchOfOtherCols NVARCHAR(500),
CONSTRAINT [PK_Id] PRIMARY KEY CLUSTERED ([Id] ASC))
CREATE TABLE RowKeyValue(RowId INT, KeyPart NVARCHAR(50), ValuePart NVARCHAR(50),
CONSTRAINT [FK_RowId_Id] FOREIGN KEY (RowId) REFERENCES OriginalRows(Id))
INSERT INTO OriginalRows VALUES
(1, 55.5, 'Some cool red coat'),
(2, 80.0, 'Some cool green coat XL'),
(3, 250.00, 'Some cool green coat L'),
(4, 100.0, 'Some whiskey'),
(5, 42.0, 'This is not a match')
INSERT INTO RowKeyValue VALUES
(1, 'Color', 'Red'),
(1, 'Size', 'XL'),
(1, 'Kind', 'Coat'),
(2, 'Color', 'Green'),
(2, 'Size', 'XL'),
(2, 'Kind', 'Coat'),
(3, 'Color', 'Green'),
(3, 'Size', 'L'),
(3, 'Kind', 'Coat'),
(4, 'Color', 'Green'),
(4, 'Size', 'Medium'),
(4, 'Kind', 'Whiskey')
INSERT INTO TargetKeyValue VALUES
(55, 'Color', 'Red'),
(56, 'Color', 'Green'),
(56, 'Size', 'XL'),
(57, 'Kind', 'Coat'),
(58, 'Color', 'Green'),
(58, 'Size', 'Medium'),
(58, 'Kind', 'Whiskey')
这给出以下 tables:
-- table OriginalRows
Id Cost BunchOfOtherCols
1 56 Some cool red coat
2 80 Some cool green coat XL
3 250 Some cool green coat L
4 100 Some whiskey
5 42 This is not a match
-- table RowKeyValue
RowId KeyPart ValuePart
1 Color Red
1 Size XL
1 Kind Coat
2 Color Green
2 Size XL
2 Kind Coat
3 Color Green
3 Size L
3 Kind Coat
4 Color Green
4 Size Medium
4 Kind Whiskey
-- table TargetKeyValue
TargetId TargetKey TargetValue
55 Color Red
56 Color Green
56 Size XL
57 Kind Coat
58 Color Green
58 Size Medium
58 Kind Whiskey
预期结果
下面的函数将给出正确的结果:
Id Cost BunchOfOtherCols IsTargetMatch TargetKeyId
1 56 Some cool red coat 1 55
2 80 Some cool green coat XL 1 56
3 250 Some cool green coat L 1 57
4 100 Some whiskey 1 58
5 42 This is not a match 0 NULL
换句话说:
- 将原始行 ID 绑定到它首先匹配的目标 ID(如果这样更容易,我可以接受多次返回的连接)
- 不匹配时显示原始行
- 如果属于一个目标 ID 的一组与给定原始行的相同值匹配,则匹配为真
当前使用游标的方法...唉
下面的代码使用了游标,但这被证明很慢(这是可以理解的,因为它基本上只是一次又一次的非索引 table 扫描)。
我尝试的另一种方法是使用 XML PATH 查询,但结果证明这是行不通的(很简单,但也太慢了)。
我知道这在关系数据库中是一项非常重要的任务,但我希望仍然有一个相当简单的解决方案。我下面的东西有点管用,我可能只是使用批处理来存储结果或其他东西,除非有更好的方法来使用 SET
操作或,idunno,FULL JOIN
?
可以在视图中使用的任何解决方案(即不涉及动态 SQL 或调用 SP)都可以。我们曾经有基于 SP 的解决方案,但由于需要在 PowerBI 和其他系统中分析数据,SQL 视图和确定性是必经之路。
这是我所追求的一个完全有效的最小示例。该功能是我正在寻找的部分,用更少的程序和更多的功能来代替,即基于集合的方法:
CREATE TABLE TargetKeyValue(TargetId INT, TargetKey NVARCHAR(50), TargetValue NVARCHAR(50))
CREATE TABLE OriginalRows(Id INT, Cost DECIMAL, BunchOfOtherCols NVARCHAR(500),
CONSTRAINT [PK_Id] PRIMARY KEY CLUSTERED ([Id] ASC))
CREATE TABLE RowKeyValue(RowId INT, KeyPart NVARCHAR(50), ValuePart NVARCHAR(50),
CONSTRAINT [FK_RowId_Id] FOREIGN KEY (RowId) REFERENCES OriginalRows(Id))
INSERT INTO OriginalRows VALUES
(1, 55.5, 'Some cool red coat'),
(2, 80.0, 'Some cool green coat XL'),
(3, 250.00, 'Some cool green coat L'),
(4, 100.0, 'Some whiskey'),
(5, 42.0, 'This is not a match')
INSERT INTO RowKeyValue VALUES
(1, 'Color', 'Red'),
(1, 'Size', 'XL'),
(1, 'Kind', 'Coat'),
(2, 'Color', 'Green'),
(2, 'Size', 'XL'),
(2, 'Kind', 'Coat'),
(3, 'Color', 'Green'),
(3, 'Size', 'L'),
(3, 'Kind', 'Coat'),
(4, 'Color', 'Green'),
(4, 'Size', 'Medium'),
(4, 'Kind', 'Whiskey')
INSERT INTO TargetKeyValue VALUES
(55, 'Color', 'Red'),
(56, 'Color', 'Green'),
(56, 'Size', 'XL'),
(57, 'Kind', 'Coat'),
(58, 'Color', 'Green'),
(58, 'Size', 'Medium'),
(58, 'Kind', 'Whiskey')
GO
CREATE FUNCTION dbo.MatchTargetAgainstKeysFromRow
(
@rowid INT
)
RETURNS @MatchResults TABLE(
IsTargetMatch BIT,
TargetKeyId INT)
AS
BEGIN
--
-- METHOD (1) (faster, by materializing the xml field into a cross-over lookup table)
--
-- single row from activities as key/value pairs multi-row
DECLARE @rowAsKeyValue AS TABLE(KeyPart NVARCHAR(1000), ValuePart NVARCHAR(MAX))
INSERT INTO @rowAsKeyValue (KeyPart, ValuePart)
SELECT KeyPart, ValuePart FROM RowKeyValue WHERE RowId = @rowid
DECLARE @LookupColumn NVARCHAR(100)
DECLARE @LookupValue NVARCHAR(max)
DECLARE @TargetId INT
DECLARE @CurrentTargetId INT
DECLARE @IsMatch INT
DECLARE key_Cursor CURSOR
LOCAL STATIC FORWARD_ONLY READ_ONLY
FOR SELECT TargetKey, TargetValue, TargetId FROM TargetKeyValue ORDER BY TargetId
OPEN key_Cursor
FETCH NEXT FROM key_Cursor INTO @LookupColumn, @LookupValue, @TargetId
WHILE @@FETCH_STATUS = 0
BEGIN
SET @IsMatch = (SELECT COUNT(*) FROM @rowAsKeyValue WHERE KeyPart = @LookupColumn AND ValuePart = @LookupValue)
IF(@IsMatch = 0)
BEGIN
-- move to next key that isn't the current key
SET @CurrentTargetId = @TargetId
WHILE @@FETCH_STATUS = 0 AND @CurrentTargetId = @TargetId
BEGIN
FETCH NEXT FROM key_Cursor INTO @LookupColumn, @LookupValue, @TargetId
END
END
ELSE
BEGIN
SET @CurrentTargetId = @TargetId
WHILE @@FETCH_STATUS = 0 AND @IsMatch > 0 AND @CurrentTargetId = @TargetId
BEGIN
FETCH NEXT FROM key_Cursor INTO @LookupColumn, @LookupValue, @TargetId
IF @CurrentTargetId = @TargetId
SET @IsMatch = (SELECT COUNT(*) FROM @rowAsKeyValue WHERE KeyPart = @LookupColumn AND ValuePart = @LookupValue)
END
IF @IsMatch > 0
BEGIN
-- we found a positive matching key, nothing more to do
BREAK
END
END
END
DEALLOCATE key_Cursor -- deallocating a cursor also closes it
INSERT @MatchResults
SELECT
(CASE WHEN (SELECT COUNT(*) FROM @rowAsKeyValue) > 0 THEN 1 ELSE 0 END),
(CASE WHEN @IsMatch > 0 THEN @CurrentTargetId ELSE NULL END)
RETURN
END
GO
-- function in action
select * from OriginalRows
cross apply dbo.MatchTargetAgainstKeysFromRow(Id) fn
-- cleanup
drop function dbo.MatchTargetAgainstKeysFromRow
drop table TargetKeyValue
drop table RowKeyValue
drop table OriginalRows
这道题是Relational Division With Remainder的一个例子,有多个被除数
关系划分基本上与连接相反:在这种情况下,我们想知道哪个 OriginalRows
匹配哪个 TargetIds
,基于每个 key/value 对 TargetId
匹配 OriginalRows
.
有很多方法可以做到这一点,这里有一些:
SELECT
r.Id,
r.Cost,
r.BunchOfOtherCols,
t.TargetId
FROM OriginalRows r
OUTER APPLY (
SELECT ttKV.TargetId
FROM TargetKeyValue tKV
LEFT JOIN RowKeyValue rKV
ON rKV.KeyPart = tKV.TargetKey AND rKV.ValuePart = tKV.TargetValue
AND rKV.RowId = r.Id
GROUP BY tKV.TargetId
HAVING COUNT(*) = COUNT(rKV.RowId) -- all target k/vs have match
) t;
SELECT
r.Id,
r.Cost,
r.BunchOfOtherCols,
tKV.TargetId
FROM OriginalRows r
CROSS JOIN TargetKeyValue tKV
LEFT JOIN RowKeyValue rKV
ON rKV.KeyPart = tKV.TargetKey AND rKV.ValuePart = tKV.TargetValue
AND rKV.RowId = r.Id
GROUP BY
r.Id,
r.Cost,
r.BunchOfOtherCols,
tKV.TargetId
HAVING COUNT(*) = COUNT(rKV.RowId) -- all target k/vs have match
SELECT
r.Id,
r.Cost,
r.BunchOfOtherCols,
tKV.TargetId
FROM OriginalRows r
CROSS JOIN TargetKeyValue tKV
CROSS APPLY (VALUES (CASE WHEN EXISTS (SELECT 1
FROM RowKeyValue rKV
WHERE rKV.KeyPart = tKV.TargetKey AND rKV.ValuePart = tKV.TargetValue
AND rKV.RowId = r.Id
) THEN 1 END
) ) rKV(IsMatch)
GROUP BY
r.Id,
r.Cost,
r.BunchOfOtherCols,
tKV.TargetId
HAVING COUNT(*) = COUNT(rKV.IsMatch) -- all target k/vs have match
您也可以将
替换为HAVING COUNT(*) = COUNT(rKV.RowId)
HAVING COUNT(CASE WHEN rKV.RowId IS NULL THEN 1 END) = 0 -- all target k/vs have match
如果你想要单个函数 OriginalRows
,那就更简单了:
CREATE FUNCTION dbo.MatchTargetAgainstKeysFromRow
(
@rowid INT
)
RETURNS TABLE
AS RETURN (
SELECT
tKV.TargetId
FROM TargetKeyValue tKV
LEFT JOIN RowKeyValue rKV
ON rKV.KeyPart = tKV.TargetKey AND rKV.ValuePart = tKV.TargetValue
AND rKV.RowId = @rowid
GROUP BY
tKV.TargetId
HAVING COUNT(*) = COUNT(rKV.RowId) -- all target k/vs have match
);
GO