在从 Postgres9.4 到 Greenplum 的数据迁移过程中,我应该如何处理我的 UNIQUE 约束

How should I deal with my UNIQUE constraints during my data migration from Postgres9.4 to Greenplum

当我在 greenplum 中执行以下 sql(包含在 Postgres9.4 的 pg_dump 生成的 sql 文件中时):

CREATE TABLE "public"."trm_concept" (
"pid" int8 NOT NULL,
"code" varchar(100)  NOT NULL,
"codesystem_pid" int8,
"display" varchar(400) ,
"index_status" int8,
CONSTRAINT "trm_concept_pkey" PRIMARY KEY ("pid"),
CONSTRAINT "idx_concept_cs_code" UNIQUE ("codesystem_pid", "code")
);

我收到这个错误:

ERROR:  Greenplum Database does not allow having both PRIMARY KEY and UNIQUE constraints

为什么 greenplum 不允许这样做?我真的需要这个唯一约束来保证一些规则,我该如何在 greenplum 中修复它?

许多(也许是大多数)分析数据库不支持此类约束。 Greenplum 在支持可执行 PRIMARY KEY 方面有些独特。

FWIW,在 Redshift 中,我 运行 在更改数据以确保我的约束仍然有效的任何 ETL 步骤之后的额外逻辑。

您可以在这里尝试相同的方法,但我强烈建议您在要检查的列上划分 table。

  • a UNIQUE 约束是用 btree 索引完成的
  • 主键意味着 UNIQUENOT NULL
  • GreenPlum 分发给 child/shards 或任何您声明为 UNIQUE 的任何内容。

要让 GreenTree 实现 UNIQUE 约束——如您所愿——该索引必须是

  • 复制到每个 child
  • 以符合 ACID 的方式更新

这样做会完全消除 运行 GreenPlum 的好处。您不妨回到 PostgreSQL。

From the docs about CREATE TABLE

When creating a table, there is an additional clause to declare the Greenplum Database distribution policy. If a DISTRIBUTED BY or DISTRIBUTED RANDOMLY clause is not supplied, then Greenplum assigns a hash distribution policy to the table using either the PRIMARY KEY (if the table has one) or the first column of the table as the distribution key. Columns of geometric or user-defined data types are not eligible as Greenplum distribution key columns. If a table does not have a column of an eligible data type, the rows are distributed based on a round-robin or random distribution. To ensure an even distribution of data in your Greenplum Database system, you want to choose a distribution key that is unique for each record, or if that is not possible, then choose DISTRIBUTED RANDOMLY.

同一个文档说了关于 PRIMARY KEY,

For a table to have a primary key, it must be hash distributed (not randomly distributed), and the primary key The column(s) that are unique must contain all the columns of the Greenplum distribution key.

这是 CREATE INDEX

上的文档

In Greenplum Database, unique indexes are allowed only if the columns of the index key are the same as (or a superset of) the Greenplum distribution key. On partitioned tables, a unique index is only supported within an individual partition - not across all partitions.