如何使用 Python 从 AWS Redshift 卸载数据 table 并保存到 s3 存储桶中（附示例）？

Question

我正在尝试从 AWS redshift tables 中提取数据并使用 Python 保存到 s3 存储桶中。我在 R 中做了同样的事情，但我想在 Python 中复制相同的内容。这是我使用的代码

R

drv <- dbDriver("PostgreSQL")
connection <- dbConnect(drv,
                    host = "xyz.amazonaws.com",
                    port = "abcd",
                    user = "a",
                    password = "b",
                    dbname = "DB")


 dbGetQuery(connection, "UNLOAD ('select COL1,COL2,COL3
                    from xyz 
                    where user_name in (''ythm'')
                     and customer=''RANDOM'' 
                     and utc_date between ''2021-10-01'' and ''2022-01-21'' 
                                                           
               ')
               
               TO 's3://MYBUCKET/Industry_Raw_Data_'
               CREDENTIALS
               'aws_access_key_id=ABC;aws_secret_access_key=HYU'
               DELIMITER '|'
               ALLOWOVERWRITE
               PARALLEL OFF;")




dbDisconnect(connection)

我已经能够使用以下脚本连接到 aws reshift Db

Python

import psycopg2
import pandas as pd

connection=psycopg2.connect(
host="xyz.amazonaws.com",
port = "abcd",
database="DB",
user="a",
password="b")

我正在尝试创建一个 table 并保存到 s3 存储桶中，关于如何在 Python 上实现该目标的任何建议？

Answer 1

创建连接后，您可以运行相同的 UNLOAD 查询。

执行 SQL 语句的方式是创建一个游标并运行使用 'execute' 方法 (https://www.psycopg.org/docs/cursor.html?highlight=execute#cursor.execute):

sql = """UNLOAD ('select COL1,COL2,COL3
                from xyz 
                where user_name in (''ythm'')
                 and customer=''RANDOM'' 
                 and utc_date between ''2021-10-01'' and ''2022-01-21'' 
                                                       
           ')
           
           TO 's3://MYBUCKET/Industry_Raw_Data_'
           CREDENTIALS
           'aws_access_key_id=ABC;aws_secret_access_key=HYU'
           DELIMITER '|'
           ALLOWOVERWRITE
           PARALLEL OFF"""

cur = con.cursor()
cur.execute(sql)
con.commit()
con.close()

如何使用 Python 从 AWS Redshift 卸载数据 table 并保存到 s3 存储桶中（附示例）？

How to unload a data table from AWS Redshift and save into s3 bucket using Python (example attached )?

python

r

amazon-s3

amazon-redshift

postgresql-9.3

R

Python