在 Amazon Redshift 中追加和覆盖

Question

由于 Redshift 基于 PostgreSQL，它是否可以选择在从 S3 复制到 redshift 时覆盖或追加 table 中的数据？

我唯一得到的是触发器的使用，但它们不接受任何参数。

如果数据已经在 table.

Answer 1

Redshift 不允许您像其他 sql 数据库那样创建触发器或事件，我找到的解决方案是运行更新（sql 查询），尽管您也可以使用Python 或其他语言并使用 crontab 任务安排 Rscript。

Answer 2

当使用 COPY 命令将数据从 Amazon S3 加载到 Amazon Redshift 时，数据被附加到目标 table。

Redshift 没有 "overwrite" 选项。如果您希望用正在加载的数据替换现有数据，您可以：

将数据加载到临时文件中table
删除主table中与传入数据匹配的行，例如：

DELETE FROM main-table WHERE id IN (SELECT id from temp-table)
将行从临时table复制到主table，例如：

SELECT * 从 temp-table 进入 main-table

参见：Updating and Inserting New Data

Answer 3

截至 2019 年 5 月，Redshift 支持存储过程，因此您可以像这样打包一组 queries/statements：

CREATE OR REPLACE PROCEDURE public.copy_and_cleanse_data(overwrite bool)
AS $$
BEGIN
    if overwrite IS TRUE THEN DELETE FROM myredshifttable;
    copy myredshifttable
        from 's3://awssampledbuswest2/tickit/category_pipe.txt'
        iam_role 'arn:aws:iam::<aws-account-id>:role/<role-name>'
        region 'us-west-2';
    UPDATE myredshifttable SET myfield = REPLACE(myfield, 'foo', 'bar');
END;
$$ LANGUAGE plpgsql
SECURITY DEFINER;

然后使用或安排以下查询：

CALL public.copy_and_cleanse_data()

在 Amazon Redshift 中追加和覆盖

Append and Overwrite in Amazon Redshift

postgresql

amazon-s3

amazon-redshift