aws glue 中的 catalog_connection 参数是什么？

Question

我希望每 4 小时定期运行一个 etl 作业，它将合并（合并）来自 s3 存储桶（parquet 格式）的数据和来自 redshift 的数据。找出唯一的，然后将其再次写入 redshift，替换旧的 redshift 数据。对于将数据帧写入红移，this

glueContext.write_dynamic_frame.from_jdbc_conf(frame, catalog_connection, connection_options={}, redshift_tmp_dir = "", transformation_ctx="")

Writes a DynamicFrame using the specified JDBC connection information.
frame – The DynamicFrame to write.
catalog_connection – A catalog connection to use.
connection_options – Connection options, such as path and database table (optional).
redshift_tmp_dir – An Amazon Redshift temporary directory to use (optional).
transformation_ctx – A transformation context to use (optional).

好像是这样。但是 catalog_connection 是什么意思？它指的是胶水目录吗？如果是，那么胶水目录中的内容是什么？

Answer 1

catalog_connection指的是glue目录里面定义的glue connection。

假设在粘合连接中有一个名为 redshift_connection 的连接，它将被用作：

glueContext.write_dynamic_frame.from_jdbc_conf(frame = m_df, 
               catalog_connection = "redshift_connection",
               connection_options = {"dbtable": df_name, "database": "testdb"},
               redshift_tmp_dir = "s3://glue-sample-target/temp-dir/")

以下是一些包含详细信息的示例：
https://aws.amazon.com/premiumsupport/knowledge-center/sql-commands-redshift-glue-job/

aws glue 中的 catalog_connection 参数是什么？

What is catalog_connection param in aws glue?

amazon-web-services

amazon-redshift

aws-glue

aws-glue-data-catalog