列复制和更新与列创建和插入

Question

我有一个 table 在 PostgreSQL 9.2.10 中有 3200 万行和 31 列。我正在通过添加具有更新值的列来更改 table。

例如，如果初始 table 是：

id     initial_color
--     -------------
1      blue
2      red
3      yellow

我正在修改 table 结果是：

id     initial_color     modified_color
--     -------------     --------------
1      blue              blue_green
2      red               red_orange
3      yellow            yellow_brown

我有代码可以读取 initial_color 列并更新值。

鉴于我的 table 有 3200 万行并且我必须在 31 列中的五列上应用此过程，执行此操作的最有效方法是什么？我目前的选择是：

复制列并更新新列中的行
创建一个空列并插入新值

我可以一次选择一列，也可以一次选择所有五列。列类型是 character varying 或 character.

Answer 1

也许我误读了这个问题，但据我所知，您有两种可能性来创建带有额外列的 table：

CREATE TABLE
这将创建一个新的 table 并且可以使用
完成填充
- CREATE TABLE .. AS SELECT.. 用于填充创作或
- 稍后使用单独的 INSERT...SELECT... 两种变体都不是您想要做的，正如您所说的 解决方案，但没有列出所有字段。
  此外，这将需要复制所有数据（加上新字段）。
ALTER TABLE...ADD ...
这将创建新列。由于我不知道任何引用现有列值的可能性，您将需要一个额外的 UPDATE ..SET... 来填充值。

所以，我没有看到任何方法来实现遵循您的选择 1 的过程。

然而，复制（列）数据只是为了在第二步中覆盖它们在任何情况下都不是最佳选择。更改 table 添加新列的作用很小 I/O。由此看来，即使有可能执行您的选择 1，遵循选择 2 也可以保证更好的性能。

因此，执行 2 个语句，一个 ALTER TABLE 继续添加所有新列，然后 UPDATE 为这些列提供新值将实现您想要的。

Answer 2

创建新列（修改后的颜色），所有记录的值为 NULL 或空白，

运行一个更新语句，假设你的 table 名字是 'Table'.

update table
set modified_color = 'blue_green'
where initial_color = 'blue'

如果我是对的，这也可以这样工作

update table set modified_color = 'blue_green' where initial_color = 'blue';
update table set modified_color = 'red_orange' where initial_color = 'red';
update table set modified_color = 'yellow_brown' where initial_color = 'yellow';

完成此操作后，您可以进行另一次更新（假设您有另一列我将调用 modified_color1）

update table set 'modified_color1'= 'modified_color'

Answer 3

The columns types are either character varying or character.

不要使用character，那是一种误解。 varchar 没问题，但我建议只对任意字符数据使用 text。

Any downsides of using data type "text" for storing strings?

Given that my table has 32 million rows and that I have to apply this procedure on five of the 31 columns, what is the most efficient way to do this?

如果您没有依赖现有 table 的对象（视图、外键、函数），最有效的方法是创建一个新的 table。像这样的东西（细节取决于你安装的细节）：

BEGIN;
LOCK TABLE tbl_org IN SHARE MODE;  -- to prevent concurrent writes

CREATE TABLE tbl_new (LIKE tbl_org INCLUDING STORAGE INCLUDING COMMENTS);

ALTER tbl_new ADD COLUMN modified_color text
            , ADD COLUMN modified_something text;
            -- , etc
INSERT INTO tbl_new (<all columns in order here>)
SELECT <all columns in order here>
    ,  myfunction(initial_color) AS modified_color  -- etc
FROM   tbl_org;
-- ORDER  BY tbl_id;  -- optionally order rows while being at it.

-- Add constraints and indexes like in the original table here

DROP tbl_org;
ALTER tbl_new RENAME TO tbl_org;
COMMIT;

如果你有依赖对象，你需要做更多。

要么是，一定要加上all five at once。如果您在单独的查询中更新每个版本，由于 Postgres 的 MVCC 模型，您每次都会编写另一个行版本。

具有更多详细信息、链接和解释的相关案例：

Updating database rows without locking the table in PostgreSQL 9.2
Best way to populate a new column in a large table?
Optimizing bulk update performance in PostgreSQL

在创建新的 table 时，您还可以以优化的方式对列进行排序：

Calculating and saving space in PostgreSQL

列复制和更新与列创建和插入

Column Copy and Update vs. Column Create and Insert

sql

postgresql

database-administration

postgresql-performance

bulkupdate