where 子句中的多列匹配

Multiple columns matching in where clause

我在 table 中有四列(Col1、Col2、Col3、Col4),在 SQL Server 2019 中有几百万条记录。

在存储过程中,我必须传递四个输入参数@Col1、@Col2、@Col3、@Col4,它应该return success/failure是否找到所有四个值,与列顺序无关。例如@Col1 可以匹配 Col2.

Col2、Col3、Col4 中的某些值可以为空,但 Col1 中始终会有一些数据。

我已经准备了一些我测试过的示例数据和场景。

CREATE TABLE SampleData(Id INT IDENTITY(1,1), Col1 VARCHAR(20), Col2 VARCHAR(20), Col3 VARCHAR(20), Col4 VARCHAR(20))

INSERT INTO SampleData(Col1, Col2, Col3, Col4)
SELECT 'ABC','DEF','GHI','JKL' UNION 
SELECT '123','456','789','100' UNION 
SELECT 'ABC','XYZ','','' UNION 
SELECT '9898','6565',NULL,NULL UNION 
SELECT '989844','D656555','','' UNION 
SELECT '8888','9999','7777','6666' UNION 
SELECT '1234','5678','4321',NULL UNION
SELECT '465456465',NULL,NULL,NULL   

 

存储过程

CREATE PROC dbo.ValidateSampleData(
 @Col1 VARCHAR(20) = NULL
,@Col2 VARCHAR(20) = NULL
,@Col3 VARCHAR(20) = NULL
,@Col4 VARCHAR(20) = NULL
)
AS
BEGIN
    
    Declare @a as bit = 0, @Message VARCHAR(50) = 'Data Not Matched'
    if(@Col1 is NULL or @Col2 is NULL or @Col3 is NULL or @Col4 is NULL )
    begin
        set @a = 1
    end
 
    SELECT  @Message = 'Data Matched ' 
    FROM    SampleData SD
    where   (Col1 in (@Col1,@Col2,@Col3,@Col4) or (Col1 is null and @a = 1))
    and     (Col2 in (@Col1,@Col2,@Col3,@Col4) or (Col2 is null and @a = 1))
    and     (Col3 in (@Col1,@Col2,@Col3,@Col4) or (Col3 is null and @a = 1))
    and     (Col4 in (@Col1,@Col2,@Col3,@Col4) or (Col4 is null and @a = 1))

    and     (select sum(    (case when Col1 is null then 1 else 0 end)
                    +   (case when Col2 is null then 1 else 0 end)
                    +   (case when Col3 is null then 1 else 0 end)
                    +   (case when Col4 is null then 1 else 0 end)
                    ) from SampleData where Id = SD.Id) = 
            (select sum(    (case when @Col1 is null then 1 else 0 end)
                +   (case when @Col2 is null then 1 else 0 end)
                +   (case when @Col3 is null then 1 else 0 end)
                +   (case when @Col4 is null then 1 else 0 end))) 

    SELECT @Message
END

这是我试过的一些样本数据集

DECLARE 
@Col1 VARCHAR(20) = NULL
,@Col2 VARCHAR(20) = NULL
,@Col3 VARCHAR(20) = NULL
,@Col4 VARCHAR(20) = NULL 



--Case 0  - Should not matched 
SELECT @Col1 = 'ABC', @Col2 = 'XYZ' , @Col3 = 'testtest' , @Col4 = ''  
EXEC ValidateSampleData @Col1=@Col1, @Col2=@Col2, @Col3=@Col3, @Col4=@Col4  

--Case 1 
SELECT @Col1 = 'ABC', @Col2 = 'DEF' , @Col3 = 'GHI' , @Col4 = 'JKL'  
EXEC ValidateSampleData @Col1=@Col1, @Col2=@Col2, @Col3=@Col3, @Col4=@Col4  

--Case 2
SELECT @Col1 = 'DEF', @Col2 = 'JKL' , @Col3 = 'ABC' , @Col4 = 'GHI' 
EXEC ValidateSampleData @Col1=@Col1, @Col2=@Col2, @Col3=@Col3, @Col4=@Col4  

--Case 3 
SELECT @Col1 = '123', @Col2 = '456' , @Col3 = '789' , @Col4 = '100'  
EXEC ValidateSampleData @Col1=@Col1, @Col2=@Col2, @Col3=@Col3, @Col4=@Col4  

--Case 4 
SELECT @Col1 = '1234', @Col2 = '5678' , @Col3 = '4321' , @Col4 = NULL  
EXEC ValidateSampleData @Col1=@Col1, @Col2=@Col2, @Col3=@Col3, @Col4=@Col4  

--Case 5 
SELECT @Col1 = '465456465', @Col2 = NULL , @Col3 = NULL , @Col4 = NULL  
EXEC ValidateSampleData @Col1=@Col1, @Col2=@Col2, @Col3=@Col3, @Col4=@Col4  

--Case 6 
SELECT @Col1 = '8888', @Col2 = '9999' , @Col3 = '7777' , @Col4 = '6666'  
EXEC ValidateSampleData @Col1=@Col1, @Col2=@Col2, @Col3=@Col3, @Col4=@Col4  

--Case 7 
SELECT @Col1 = '9898', @Col2 = '6565' , @Col3 = NULL , @Col4 = NULL
EXEC ValidateSampleData @Col1=@Col1, @Col2=@Col2, @Col3=@Col3, @Col4=@Col4  

--Case 8 
SELECT @Col1 = '989844', @Col2 = 'D656555' , @Col3 = '' , @Col4 = '' 
EXEC ValidateSampleData @Col1=@Col1, @Col2=@Col2, @Col3=@Col3, @Col4=@Col4  

--Case 9   
SELECT @Col1 = 'ABC', @Col2 = 'XYZ' , @Col3 = '' , @Col4 = ''  
EXEC ValidateSampleData @Col1=@Col1, @Col2=@Col2, @Col3=@Col3, @Col4=@Col4  
 
 
--Case 10  - Should not matched
SELECT @Col1 = 'ABC', @Col2 = 'XYZ' , @Col3 = 'tet' , @Col4 = ''  
EXEC ValidateSampleData @Col1=@Col1, @Col2=@Col2, @Col3=@Col3, @Col4=@Col4  
 

我在某些情况下取得了预期的结果,但它在 table 中的 NULL 值不正确。 案例 0 不应匹配

此外,它看起来没有经过优化,而且处理大量数据时速度很慢。

简单来说,我想检查 table 中是否所有 4 个值都出现,并且 NULL 匹配 NULL?如果匹配,return 匹配,如果不匹配,则不匹配?

我的方法是首先创建一个 table,其中包含 4 个值的所有可能组合。这可以通过创建一个包含 4 行的派生 table 来完成,然后将其连接到自身 4 次,每次连接确保您没有选择已经选择的行,即

DECLARE @Values TABLE
(
    Col1 VARCHAR(20),
    Col2 VARCHAR(20),
    Col3 VARCHAR(20),
    Col4 VARCHAR(20)
    UNIQUE (Col1, Col2, Col3, Col4)
);

WITH Data AS
(   SELECT  Value, v.Ordinal
    FROM    (VALUES (1, @Col1), (2, @Col2), (3, @Col3), (4, @Col4)) AS v (Ordinal, Value)
)
INSERT @Values(Col1, Col2, Col3, Col4)
SELECT  DISTINCT d1.Value, d2.Value, d3.Value, d4.Value
FROM    Data AS d1
        INNER JOIN Data AS d2
            ON d2.Ordinal NOT IN (d1.Ordinal)
        INNER JOIN Data AS d3
            ON d3.Ordinal NOT IN (d1.Ordinal, d2.Ordinal)
        INNER JOIN Data AS d4
            ON d4.Ordinal NOT IN (d1.Ordinal, D2.Ordinal, d3.Ordinal);

举一个非常简单的例子,其中 @Col1 = 'A',所有其他人都是 NULL,你最终会得到类似的东西:

Col1 Col2 Col3 Col4
NULL NULL NULL A
NULL NULL A NULL
NULL A NULL NULL
A NULL NULL NULL

然后您可以使用 INTERSECT 来检查您的 table。这样做的好处是 INTERSECT 进行 NULL 匹配,即 SELECT NULL INTERSECT SELECT NULL 将 return 一条记录,而 SELECT NULL WHERE NULL = NULL 则不会。

IF EXISTS
    (   SELECT  Col1, Col2, Col3, Col4
        FROM    SampleData
        INTERSECT
        SELECT  Col1, Col2, Col3, Col4
        FROM    @Values
    )
BEGIN
    SELECT 'Data Matched';
END
ELSE
BEGIN
    SELECT 'Data not Matched';
END

此方法的主要优点(除了 return 您的预期结果之外)是它可以利用 SampleData 上的索引,因此如果您添加:

CREATE INDEX IX_SampleData ON dbo.SampleData (Col1, Col2, Col3, Col4);

然后这将在使用 INTERSECT 时使用,但不符合您目前的逻辑:

就目前的小数据集而言,这并没有太大区别,但对于更大的数据集,它就会有所作为。以上是 SampleData 中的 200k 行。

因此您的完整程序将类似于:

CREATE OR ALTER PROC dbo.ValidateSampleData(
 @Col1 VARCHAR(20) = NULL
,@Col2 VARCHAR(20) = NULL
,@Col3 VARCHAR(20) = NULL
,@Col4 VARCHAR(20) = NULL
)
AS
BEGIN

    DECLARE @Values TABLE
    (
        Col1 VARCHAR(20),
        Col2 VARCHAR(20),
        Col3 VARCHAR(20),
        Col4 VARCHAR(20)
        UNIQUE (Col1, Col2, Col3, Col4)
    );

    WITH Data AS
    (   SELECT  Value, v.Ordinal
        FROM    (VALUES (1, @Col1), (2, @Col2), (3, @Col3), (4, @Col4)) AS v (Ordinal, Value)
    )
    INSERT @Values(Col1, Col2, Col3, Col4)
    SELECT  DISTINCT d1.Value, d2.Value, d3.Value, d4.Value
    FROM    Data AS d1
            INNER JOIN Data AS d2
                ON d2.Ordinal NOT IN (d1.Ordinal)
            INNER JOIN Data AS d3
                ON d3.Ordinal NOT IN (d1.Ordinal, d2.Ordinal)
            INNER JOIN Data AS d4
                ON d4.Ordinal NOT IN (d1.Ordinal, D2.Ordinal, d3.Ordinal);

    IF EXISTS
        (   SELECT  Col1, Col2, Col3, Col4
            FROM    SampleData
            INTERSECT
            SELECT  Col1, Col2, Col3, Col4
            FROM    @Values
        )
    BEGIN
        SELECT 'Data Matched';
    END
    ELSE
    BEGIN
        SELECT 'Data not Matched';
    END
END 

ADDEDNUM

回答评论中的问题:

I got this warning while creating Index on actual table. Can you explain ? Warning! The maximum key length for a nonclustered index is 1700 bytes. The index 'IX_SampleData' has maximum length of 2000 bytes. For some combination of large values, the insert/update operation will fail.

大概您的实际列大于问题中指定的列 (VARCHAR(20))?正如警告所说,最大密钥长度为 1700 字节,因此所有 4 列所需的存储空间必须小于该长度。每个 VARCHAR 字符需要一个字节的存储空间,因此 VARCHAR(20) 的最大值为 20 个字节,对于 4 列,您的最大值为 80 个字节。但是,如果您的列是 VARCHAR(500),那么您的最大大小是 2000 字节。由于 VARCHAR 是可变的,每个值只会根据需要占用 space。因此,如果您在定义为 VARCHAR(500) 的列中输入 A,它仍然只需要 ` byte.

假设您的列是 VARCHAR(500) 并且您尝试添加以下内容:

INSERT INTO SampleData(Col1, Col2, Col3, Col4)
SELECT REPLICATE('A', 500), REPLICATE('B', 500), REPLICATE('C', 500), REPLICATE('D', 500)

总而言之,如果您打算在所有 4 列中使用全部 500 个字符,那么您的插入将失败。如果您对列的长度非常宽松,并且所有 4 列的总长度总是少于 1700 个字符,那么您会没事的(尽管,如果您对列的长度很宽松,那么也许可以调整大小相应地它们 - 尽管在额外存储方面没有成本,但定义比您需要的列更大的列会产生其他成本,因此最好尽可能准确地使用列长度。