如何将 AWS Glue table 结构复制到 AWS Redshift

Question

我在不使用爬虫的情况下使用 AWS Glue 创建了新数据库和 Table 结构，并且可以做同样的事情，我的意思是使用爬虫创建 table 结构。这不是问题，我想要的是基于 AWS Glue table 元数据.

在 AWS Redshift 中创建相同的 table 结构

我用 Django 手动完成 Python，我得到 table 的元数据并创建 "CREATE TABLE ..." 命令并执行它。它有效，我已经有了这个替代解决方案。 我们能否从 AWS 端或使用 Boto3 等 AWS SDK 执行此操作？我不需要 table 中的任何数据，只想在 AWS Redshift 中创建空的 table。这可能吗？

我还查看了 AWS Redshift Spectrum。如果我可以在 AWS Redshift 中创建这个 table，那么我可以使用 spectrum 命令从 S3 或任何其他资源中获取数据。所以这样做我首先需要Table。

Answer 1

假设您使用正确的架构及其所有分区填充了 Glue table，您应该能够运行使用 Redshift Spectrum 对其进行查询，而无需创建实际的 table 与 CREATE TABLE... 语句。

从您的 RedShift client/editor，创建一个外部 (Spectrum) 架构，指向包含您的 Glue table 的数据目录数据库（此处命名为 spectrum_db）。 iam_role 值应该是您的 Redshift 集群 IAM 角色的 ARN，您将向其添加 glue:GetTable 操作策略。

create external schema spectrum_schema from data catalog 
database 'spectrum_db' 
iam_role 'arn:aws:iam::123456789012:role/MySpectrumRole'
create external database if not exists;

您现在应该能够运行通过外部 Glue table 进行查询。这样做的唯一限制是你不能 SELECT * 超过你的 tables:

SELECT ... FROM spectrum_schema.Your_table

从那里您应该能够更轻松地将数据从 Spectrum 移动到 Standard Redshift。

参考文献：

Creating External Schemas for Amazon Redshift Spectrum

如何将 AWS Glue table 结构复制到 AWS Redshift

How to copy AWS Glue table structure to AWS Redshift

amazon-s3

amazon-web-services

amazon-redshift

amazon-redshift-spectrum

aws-glue