断言一组列可以形成 Snowflake 中的主键的最佳方法是什么?

What is the best way to assert that a set of columns could form a primary key in Snowflake?

臭名昭著的主键约束是 not enforced in snowflake sql:

-- Generating a table with 4 rows that contain duplicates and NULLs:
CREATE OR REPLACE TEMP TABLE PRIMARY_KEY_TEST AS
SELECT
*
FROM (
           SELECT 1    AS PK, 'TEST_TEXT' AS TEXT
UNION ALL  SELECT 1    AS PK, 'TEST_TEXT' AS TEXT
UNION ALL  SELECT NULL AS PK, NULL        AS TEXT
UNION ALL  SELECT NULL AS PK, NULL        AS TEXT
)
;

SELECT *
FROM PRIMARY_KEY_TEST
;
PK TEXT
1 TEST_TEXT
1 TEST_TEXT
NULL NULL
NULL NULL
-- These constraints will NOT throw any errors in Snowflake
ALTER TABLE PRIMARY_KEY_TEST ADD PRIMARY KEY (PK);
ALTER TABLE PRIMARY_KEY_TEST ADD UNIQUE (TEXT);

然而,知道一组列的值对于每一行 uniuqe 并且 never NULL 是至关重要的更新一组数据时检查。

所以我正在寻找一段易于编写和阅读(最好是 1-2 行)的代码(可能基于某些 Snowflake 函数),如果一组列不再构成可行的主列,则会抛出错误键入雪花 SQL.

有什么建议吗?

您可以通过在您不希望为空的列上添加 NOT NULL 约束来在 Snowflake 中强制执行 NOT NULL。

主键约束仅供参考;当您将数据插入 table 时,它不会被强制执行。对于主键,您必须删除/删除数据,或者在插入之前必须检查数据是否存在,然后您才可以更新。 根据您的操作,您可以使用以下内容

  1. 合并(插入和更新)
  2. 使用 Distinct 检查行是否存在,然后更新或删除旧行并插入新行。
  3. 您可以使用 ROW_NUMBER 分析函数来识别重复项。

So I'm looking for a easy to write and read (ideally 1-2 lines) piece of code (proably based on some Snowflake function) that throws an error if a set of columns no longer forms a viable primary key in Snowflake SQL

使用 QUALIFY 和窗口化 COUNT 很容易编写这样的测试查询。该模式是将主键列列表放入 PARTITION BY 部分并搜索 non-unique 值,也可以添加额外的空值检查。如果列列表是主键的有效候选者,它不会 return 任何行,如果有违反规则的行,它们将被 returned:

-- checking if PK is applicable
SELECT *
FROM PRIMARY_KEY_TEST
QUALIFY COUNT(*) OVER(PARTITION BY PK) > 1
     OR PK IS NULL;
  
 -- chekcing if TEXT column is applicable 
SELECT *
FROM PRIMARY_KEY_TEST
QUALIFY COUNT(*) OVER(PARTITION BY TEXT) > 1
     OR TEXT IS NULL;
     
 -- chekcing if PK,TEXT columns are applicable 
SELECT *
FROM PRIMARY_KEY_TEST
QUALIFY COUNT(*) OVER(PARTITION BY PK,TEXT) > 1
    OR PK IS NULL
    OR TEXT IS NULL;

I'd still prefer code that can throw an error though

可以使用 Snowflake 脚本和 RAISE 异常:

BEGIN
   LET my_exception EXCEPTION (-20002, 'Columns cannot be used as PK.');

   IF (EXISTS(SELECT *
             FROM PRIMARY_KEY_TEST
              QUALIFY COUNT(*) OVER(PARTITION BY PK) > 1
               OR PK IS NULL
       )) THEN
     RAISE my_exception;
  END IF;
END;

-20002 (P0001): Uncaught exception of type 'MY_EXCEPTION' on line 8 at position 5 : Columns cannot be used as PK.