在从 Postgres9.4 到 Greenplum 的数据迁移过程中,我应该如何处理我的 UNIQUE 约束
How should I deal with my UNIQUE constraints during my data migration from Postgres9.4 to Greenplum
当我在 greenplum 中执行以下 sql(包含在 Postgres9.4 的 pg_dump
生成的 sql 文件中时):
CREATE TABLE "public"."trm_concept" (
"pid" int8 NOT NULL,
"code" varchar(100) NOT NULL,
"codesystem_pid" int8,
"display" varchar(400) ,
"index_status" int8,
CONSTRAINT "trm_concept_pkey" PRIMARY KEY ("pid"),
CONSTRAINT "idx_concept_cs_code" UNIQUE ("codesystem_pid", "code")
);
我收到这个错误:
ERROR: Greenplum Database does not allow having both PRIMARY KEY and UNIQUE constraints
为什么 greenplum 不允许这样做?我真的需要这个唯一约束来保证一些规则,我该如何在 greenplum 中修复它?
许多(也许是大多数)分析数据库不支持此类约束。 Greenplum 在支持可执行 PRIMARY KEY
方面有些独特。
FWIW,在 Redshift 中,我 运行 在更改数据以确保我的约束仍然有效的任何 ETL 步骤之后的额外逻辑。
您可以在这里尝试相同的方法,但我强烈建议您在要检查的列上划分 table。
- a
UNIQUE
约束是用 btree 索引完成的
- 主键意味着
UNIQUE
和 NOT NULL
- GreenPlum 分发给 child/shards 或任何您声明为
UNIQUE
的任何内容。
要让 GreenTree 实现 UNIQUE
约束——如您所愿——该索引必须是
- 复制到每个 child
- 以符合 ACID 的方式更新
这样做会完全消除 运行 GreenPlum 的好处。您不妨回到 PostgreSQL。
From the docs about CREATE TABLE
When creating a table, there is an additional clause to declare the Greenplum Database distribution policy. If a DISTRIBUTED BY or DISTRIBUTED RANDOMLY clause is not supplied, then Greenplum assigns a hash distribution policy to the table using either the PRIMARY KEY (if the table has one) or the first column of the table as the distribution key. Columns of geometric or user-defined data types are not eligible as Greenplum distribution key columns. If a table does not have a column of an eligible data type, the rows are distributed based on a round-robin or random distribution. To ensure an even distribution of data in your Greenplum Database system, you want to choose a distribution key that is unique for each record, or if that is not possible, then choose DISTRIBUTED RANDOMLY.
同一个文档说了关于 PRIMARY KEY,
For a table to have a primary key, it must be hash distributed (not randomly distributed), and the primary key The column(s) that are unique must contain all the columns of the Greenplum distribution key.
这是 CREATE INDEX
上的文档
In Greenplum Database, unique indexes are allowed only if the columns of the index key are the same as (or a superset of) the Greenplum distribution key. On partitioned tables, a unique index is only supported within an individual partition - not across all partitions.
当我在 greenplum 中执行以下 sql(包含在 Postgres9.4 的 pg_dump
生成的 sql 文件中时):
CREATE TABLE "public"."trm_concept" (
"pid" int8 NOT NULL,
"code" varchar(100) NOT NULL,
"codesystem_pid" int8,
"display" varchar(400) ,
"index_status" int8,
CONSTRAINT "trm_concept_pkey" PRIMARY KEY ("pid"),
CONSTRAINT "idx_concept_cs_code" UNIQUE ("codesystem_pid", "code")
);
我收到这个错误:
ERROR: Greenplum Database does not allow having both PRIMARY KEY and UNIQUE constraints
为什么 greenplum 不允许这样做?我真的需要这个唯一约束来保证一些规则,我该如何在 greenplum 中修复它?
许多(也许是大多数)分析数据库不支持此类约束。 Greenplum 在支持可执行 PRIMARY KEY
方面有些独特。
FWIW,在 Redshift 中,我 运行 在更改数据以确保我的约束仍然有效的任何 ETL 步骤之后的额外逻辑。
您可以在这里尝试相同的方法,但我强烈建议您在要检查的列上划分 table。
- a
UNIQUE
约束是用 btree 索引完成的 - 主键意味着
UNIQUE
和NOT NULL
- GreenPlum 分发给 child/shards 或任何您声明为
UNIQUE
的任何内容。
要让 GreenTree 实现 UNIQUE
约束——如您所愿——该索引必须是
- 复制到每个 child
- 以符合 ACID 的方式更新
这样做会完全消除 运行 GreenPlum 的好处。您不妨回到 PostgreSQL。
From the docs about CREATE TABLE
When creating a table, there is an additional clause to declare the Greenplum Database distribution policy. If a DISTRIBUTED BY or DISTRIBUTED RANDOMLY clause is not supplied, then Greenplum assigns a hash distribution policy to the table using either the PRIMARY KEY (if the table has one) or the first column of the table as the distribution key. Columns of geometric or user-defined data types are not eligible as Greenplum distribution key columns. If a table does not have a column of an eligible data type, the rows are distributed based on a round-robin or random distribution. To ensure an even distribution of data in your Greenplum Database system, you want to choose a distribution key that is unique for each record, or if that is not possible, then choose DISTRIBUTED RANDOMLY.
同一个文档说了关于 PRIMARY KEY,
For a table to have a primary key, it must be hash distributed (not randomly distributed), and the primary key The column(s) that are unique must contain all the columns of the Greenplum distribution key.
这是 CREATE INDEX
上的文档In Greenplum Database, unique indexes are allowed only if the columns of the index key are the same as (or a superset of) the Greenplum distribution key. On partitioned tables, a unique index is only supported within an individual partition - not across all partitions.