正确的 Set 操作是在一组集合中找到一个匹配的集合,还是 full join?

Proper Set operation to find a matching set in a set of sets, or full join?

TLDR

如何将一组集合与单个集合进行匹配并将其绑定到相应的行?

给定一行,该行具有链接摘要 table 和 key/value 对描述该行的属性,以及一堆描述如何从中总结内容的搜索描述(目标)行,我如何根据属性-table 与搜索描述中的 key/value 对的匹配,找到与给定行匹配的搜索描述?

简化示例:

CREATE TABLE TargetKeyValue(TargetId INT, TargetKey NVARCHAR(50), TargetValue NVARCHAR(50))
CREATE TABLE OriginalRows(Id INT, Cost DECIMAL, BunchOfOtherCols NVARCHAR(500),
    CONSTRAINT [PK_Id] PRIMARY KEY CLUSTERED ([Id] ASC))
CREATE TABLE RowKeyValue(RowId INT, KeyPart NVARCHAR(50), ValuePart NVARCHAR(50),
     CONSTRAINT [FK_RowId_Id] FOREIGN KEY (RowId) REFERENCES OriginalRows(Id))

INSERT INTO OriginalRows VALUES
    (1, 55.5, 'Some cool red coat'),
    (2, 80.0, 'Some cool green coat XL'),
    (3, 250.00, 'Some cool green coat L'),
    (4, 100.0, 'Some whiskey'),
    (5, 42.0, 'This is not a match')

INSERT INTO RowKeyValue VALUES
    (1, 'Color', 'Red'),
    (1, 'Size', 'XL'),
    (1, 'Kind', 'Coat'),
    (2, 'Color', 'Green'),
    (2, 'Size', 'XL'),
    (2, 'Kind', 'Coat'),
    (3, 'Color', 'Green'),
    (3, 'Size', 'L'),
    (3, 'Kind', 'Coat'),
    (4, 'Color', 'Green'),
    (4, 'Size', 'Medium'),
    (4, 'Kind', 'Whiskey')


INSERT INTO TargetKeyValue VALUES
    (55, 'Color', 'Red'),
    (56, 'Color', 'Green'),
    (56, 'Size', 'XL'),
    (57, 'Kind', 'Coat'),
    (58, 'Color', 'Green'),
    (58, 'Size', 'Medium'),
    (58, 'Kind', 'Whiskey')

这给出以下 tables:


-- table OriginalRows
Id  Cost    BunchOfOtherCols
1   56      Some cool red coat
2   80      Some cool green coat XL
3   250     Some cool green coat L
4   100     Some whiskey
5   42      This is not a match

-- table RowKeyValue
RowId   KeyPart ValuePart
1       Color   Red
1       Size    XL
1       Kind    Coat
2       Color   Green
2       Size    XL
2       Kind    Coat
3       Color   Green
3       Size    L
3       Kind    Coat
4       Color   Green
4       Size    Medium
4       Kind    Whiskey

-- table TargetKeyValue
TargetId    TargetKey   TargetValue
55          Color       Red
56          Color       Green
56          Size        XL
57          Kind        Coat
58          Color       Green
58          Size        Medium
58          Kind        Whiskey

预期结果

下面的函数将给出正确的结果:

Id  Cost    BunchOfOtherCols            IsTargetMatch   TargetKeyId
1   56      Some cool red coat          1               55
2   80      Some cool green coat XL     1               56
3   250     Some cool green coat L      1               57
4   100     Some whiskey                1               58
5   42      This is not a match         0               NULL

换句话说:

当前使用游标的方法...唉

下面的代码使用了游标,但这被证明很慢(这是可以理解的,因为它基本上只是一次又一次的非索引 table 扫描)。

我尝试的另一种方法是使用 XML PATH 查询,但结果证明这是行不通的(很简单,但也太慢了)。

我知道这在关系数据库中是一项非常重要的任务,但我希望仍然有一个相当简单的解决方案。我下面的东西有点管用,我可能只是使用批处理来存储结果或其他东西,除非有更好的方法来使用 SET 操作或,idunno,FULL JOIN?

可以在视图中使用的任何解决方案(即不涉及动态 SQL 或调用 SP)都可以。我们曾经有基于 SP 的解决方案,但由于需要在 PowerBI 和其他系统中分析数据,SQL 视图和确定性是必经之路。

这是我所追求的一个完全有效的最小示例。该功能是我正在寻找的部分,用更少的程序和更多的功能来代替,即基于集合的方法:

CREATE TABLE TargetKeyValue(TargetId INT, TargetKey NVARCHAR(50), TargetValue NVARCHAR(50))
CREATE TABLE OriginalRows(Id INT, Cost DECIMAL, BunchOfOtherCols NVARCHAR(500),
    CONSTRAINT [PK_Id] PRIMARY KEY CLUSTERED ([Id] ASC))
CREATE TABLE RowKeyValue(RowId INT, KeyPart NVARCHAR(50), ValuePart NVARCHAR(50),
     CONSTRAINT [FK_RowId_Id] FOREIGN KEY (RowId) REFERENCES OriginalRows(Id))

INSERT INTO OriginalRows VALUES
    (1, 55.5, 'Some cool red coat'),
    (2, 80.0, 'Some cool green coat XL'),
    (3, 250.00, 'Some cool green coat L'),
    (4, 100.0, 'Some whiskey'),
    (5, 42.0, 'This is not a match')

INSERT INTO RowKeyValue VALUES
    (1, 'Color', 'Red'),
    (1, 'Size', 'XL'),
    (1, 'Kind', 'Coat'),
    (2, 'Color', 'Green'),
    (2, 'Size', 'XL'),
    (2, 'Kind', 'Coat'),
    (3, 'Color', 'Green'),
    (3, 'Size', 'L'),
    (3, 'Kind', 'Coat'),
    (4, 'Color', 'Green'),
    (4, 'Size', 'Medium'),
    (4, 'Kind', 'Whiskey')


INSERT INTO TargetKeyValue VALUES
    (55, 'Color', 'Red'),
    (56, 'Color', 'Green'),
    (56, 'Size', 'XL'),
    (57, 'Kind', 'Coat'),
    (58, 'Color', 'Green'),
    (58, 'Size', 'Medium'),
    (58, 'Kind', 'Whiskey')

GO


CREATE FUNCTION dbo.MatchTargetAgainstKeysFromRow
(
    @rowid INT
)
RETURNS @MatchResults TABLE(
    IsTargetMatch BIT,
    TargetKeyId INT)

AS
BEGIN

    --
    -- METHOD (1) (faster, by materializing the xml field into a cross-over lookup table)
    --

    -- single row from activities as key/value pairs multi-row
    DECLARE @rowAsKeyValue AS TABLE(KeyPart NVARCHAR(1000), ValuePart NVARCHAR(MAX))
    INSERT INTO @rowAsKeyValue (KeyPart, ValuePart)
        SELECT KeyPart, ValuePart FROM RowKeyValue WHERE RowId = @rowid


    DECLARE @LookupColumn NVARCHAR(100)
    DECLARE @LookupValue NVARCHAR(max)
    DECLARE @TargetId INT
    DECLARE @CurrentTargetId INT
    DECLARE @IsMatch INT
    DECLARE key_Cursor CURSOR
        LOCAL STATIC FORWARD_ONLY READ_ONLY
        FOR SELECT TargetKey, TargetValue, TargetId FROM TargetKeyValue  ORDER BY TargetId

    OPEN key_Cursor
    FETCH NEXT FROM key_Cursor INTO @LookupColumn, @LookupValue, @TargetId

    WHILE @@FETCH_STATUS = 0
    BEGIN
        SET @IsMatch = (SELECT COUNT(*) FROM @rowAsKeyValue WHERE KeyPart = @LookupColumn AND ValuePart = @LookupValue)
        IF(@IsMatch = 0)
        BEGIN
            -- move to next key that isn't the current key
            SET @CurrentTargetId = @TargetId
            WHILE @@FETCH_STATUS = 0 AND @CurrentTargetId = @TargetId
            BEGIN
                FETCH NEXT FROM key_Cursor INTO @LookupColumn, @LookupValue, @TargetId
            END
        END
        ELSE
            BEGIN
                SET @CurrentTargetId = @TargetId
                WHILE @@FETCH_STATUS = 0 AND @IsMatch > 0 AND @CurrentTargetId = @TargetId
                BEGIN
                    FETCH NEXT FROM key_Cursor INTO @LookupColumn, @LookupValue, @TargetId
                    IF @CurrentTargetId = @TargetId
                        SET @IsMatch = (SELECT COUNT(*) FROM @rowAsKeyValue WHERE KeyPart = @LookupColumn AND ValuePart = @LookupValue)
                END
                IF @IsMatch > 0
                BEGIN
                    -- we found a positive matching key, nothing more to do
                    BREAK
                END
            END
    END

    DEALLOCATE key_Cursor       -- deallocating a cursor also closes it

    INSERT @MatchResults
    SELECT
        (CASE WHEN (SELECT COUNT(*) FROM @rowAsKeyValue) > 0 THEN 1 ELSE 0 END),
        (CASE WHEN @IsMatch > 0 THEN @CurrentTargetId ELSE NULL END)

    RETURN
END

GO

-- function in action
select * from OriginalRows
cross apply dbo.MatchTargetAgainstKeysFromRow(Id) fn

-- cleanup
drop function dbo.MatchTargetAgainstKeysFromRow
drop table TargetKeyValue
drop table RowKeyValue
drop table OriginalRows

这道题是Relational Division With Remainder的一个例子,有多个被除数

关系划分基本上与连接相反:在这种情况下,我们想知道哪个 OriginalRows 匹配哪个 TargetIds,基于每个 key/value 对 TargetId 匹配 OriginalRows.

的 key/value 对

有很多方法可以做到这一点,这里有一些:

SELECT
    r.Id,
    r.Cost,
    r.BunchOfOtherCols,
    t.TargetId
FROM OriginalRows r
OUTER APPLY (
    SELECT ttKV.TargetId
    FROM TargetKeyValue tKV
    LEFT JOIN RowKeyValue rKV
        ON rKV.KeyPart = tKV.TargetKey AND rKV.ValuePart = tKV.TargetValue
        AND rKV.RowId = r.Id
    GROUP BY tKV.TargetId
    HAVING COUNT(*) = COUNT(rKV.RowId)  -- all target k/vs have match
) t;
SELECT
    r.Id,
    r.Cost,
    r.BunchOfOtherCols,
    tKV.TargetId
FROM OriginalRows r
CROSS JOIN TargetKeyValue tKV
LEFT JOIN RowKeyValue rKV
    ON rKV.KeyPart = tKV.TargetKey AND rKV.ValuePart = tKV.TargetValue
    AND rKV.RowId = r.Id
GROUP BY
    r.Id,
    r.Cost,
    r.BunchOfOtherCols,
    tKV.TargetId
HAVING COUNT(*) = COUNT(rKV.RowId)  -- all target k/vs have match
SELECT
    r.Id,
    r.Cost,
    r.BunchOfOtherCols,
    tKV.TargetId
FROM OriginalRows r
CROSS JOIN TargetKeyValue tKV
CROSS APPLY (VALUES (CASE WHEN EXISTS (SELECT 1
        FROM RowKeyValue rKV
            WHERE rKV.KeyPart = tKV.TargetKey AND rKV.ValuePart = tKV.TargetValue
            AND rKV.RowId = r.Id
        ) THEN 1 END
) ) rKV(IsMatch)
GROUP BY
    r.Id,
    r.Cost,
    r.BunchOfOtherCols,
    tKV.TargetId
HAVING COUNT(*) = COUNT(rKV.IsMatch)  -- all target k/vs have match

您也可以将

替换为 HAVING COUNT(*) = COUNT(rKV.RowId)
HAVING COUNT(CASE WHEN rKV.RowId IS NULL THEN 1 END) = 0 -- all target k/vs have match

如果你想要单个函数 OriginalRows,那就更简单了:

CREATE FUNCTION dbo.MatchTargetAgainstKeysFromRow
(
    @rowid INT
)
RETURNS TABLE
AS RETURN (
    SELECT
        tKV.TargetId
    FROM TargetKeyValue tKV
    LEFT JOIN RowKeyValue rKV
        ON rKV.KeyPart = tKV.TargetKey AND rKV.ValuePart = tKV.TargetValue
        AND rKV.RowId = @rowid
    GROUP BY
        tKV.TargetId
    HAVING COUNT(*) = COUNT(rKV.RowId)  -- all target k/vs have match
);

GO