如何使用 boto3 或 psycopg2 python 库在 redshift 中插入数据

Question

“boto3”和“Psycopg2”中哪个库最适合用于 python lambda 函数中的红移操作：

在 redshift 集群中查找 table
在 redshift 集群中创建一个 table
在 redshift 集群中插入数据

如果我得到以下答复，我将不胜感激：

python 满足上述所有 3 个需求的任一库的代码。

提前致谢！！

Answer 1

使用 psycopg2 从 Lambda 直接连接到 Redshift 是更简单、更 straight-forward 的方法，但有一个很大的限制。 Lambda 函数有 run-time 限制，即使您的 SQL 命令没有超过最大值 run-time，您也需要为 Lambda 函数付费以等待 Redshift 完成 SQL.对于 fast-running SQL 命令运行很快，这不是问题，但插入数据可能需要一些时间，具体取决于数据量。

如果您所有的 Redshift 操作都不到几秒钟（并且不会随着时间的推移而变长），那么直接连接到 Redshift 的 psycopg2 可能是可行的方法。如果数据插入需要一分钟或 2 分钟，但此过程不会运行经常（每天），那么 psycopg2 可能仍然是可行的方法，因为 Lambda 在运行频繁时不是很昂贵。这是一个过程简单性与成本计算。

使用 Redshift 数据 API 更复杂。此过程允许您将 SQL 触发到 Redshift 并终止 Lambda。稍后运行ning Lambda 检查 SQL 是否已完成并检查 SQL 的结果。 SQL 未完成意味着需要稍后调用 Lambda 以查看事情是否完成。此轮询过程通常由一个 Step Function 和一组不同的 Lambda 函数完成。不是超级困难，但复杂程度高于单个 Lambda。由于这是一个轮询过程，因此在检查结果之间存在等待时间，如果太长会导致延迟，如果太短会导致 over-polling 和额外费用。

如果您出于 time-out 的原因需要数据 API 那么您可能希望同时使用 psycopg2 来对数据库进行简短的运行ning 查询 - 例如 'does this table exist?' .使用数据 API 进行 long-running 个步骤，例如 'insert this 1TB set of data into Redshift'.

Answer 2

使用 boto3 的所有三个操作的基本 python 代码示例。

import json
import boto3

clientdata = boto3.client('redshift-data')

# looks up table and returns true if found
def lookup_table(table_name):
  response = clientdata.list_tables(
    ClusterIdentifier='redshift-cluster-1',
    Database='dev',
    DbUser='awsuser',
    TablePattern=table_name
  )
  print(response)
  if ( len(response['Tables']) == 0 ):
    return False
  else:
    return True

# creates table with one integer column
def create_table(table_name):
  sqlstmt = 'CREATE TABLE '+table_name+' (col1 integer);'
  print(sqlstmt)
  response = clientdata.execute_statement(
    ClusterIdentifier='redshift-cluster-1',
    Database='dev',
    DbUser='awsuser',
    Sql=sqlstmt,
    StatementName='CreateTable'
  )
  print(response)

# inserts one row with integer value for col1
def insert_data(table_name, dval):
  print(dval)
  sqlstmt = 'INSERT INTO '+table_name+'(col1) VALUES ('+str(dval)+');'
  response = clientdata.execute_statement(
    ClusterIdentifier='redshift-cluster-1',
    Database='dev',
    DbUser='awsuser',
    Sql=sqlstmt,
    StatementName='InsertData'
  )
  print(response)

result = lookup_table('date')
if ( result ):
  print("Table exists.")
else:
  print("Table does not exist!")

create_table("testtab")
insert_data("testtab", 11)

我没有使用 Lambda，而是直接从我的 shell 执行它。
希望这对您有所帮助。假设已经为客户端设置了凭据和默认区域。

如何使用 boto3 或 psycopg2 python 库在 redshift 中插入数据

How to insert data in redshift using either of boto3 or psycopg2 python libraries

python

psycopg2

amazon-web-services

amazon-redshift

boto3