如何在 python 中使用 lambda 函数在通过 S3 连接的 AWS athena 中查询

Question

我将 .csv 文件保存在 S3 存储桶中。我可以使用 AWS Athena 查询 S3 的数据。有什么方法可以将 lambda 函数连接到 athena 并从 lambda 函数查询数据。请帮忙

谢谢

Answer 1

是的！您可以使用 boto3 与 Athena 交互。

特别是，您可能需要 start_query_execution 方法。

http://boto3.readthedocs.io/en/latest/reference/services/athena.html#Athena.Client.start_query_execution

Answer 2

正如 Chris Pollard 所说，您可以使用 boto3 从 Lambda 函数查询 Athena。

http://boto3.readthedocs.io/en/latest/reference/services/athena.html

初始化 Athena 客户端：

import boto3
client = boto3.client('athena')

然后您将执行您的查询：

queryStart = client.start_query_execution(
    QueryString = 'SELECT * FROM myTable',
    QueryExecutionContext = {
        'Database': 'myDatabase'
    }, 
    ResultConfiguration = { 'OutputLocation': 's3://your-bucket/key'}
)

如果您想在 Lambda 中检索结果（可能使用第二个函数，由于时间限制 - 请参阅 docs - 另请注意，您按 100 毫秒运行时间付费），您会使用get_query_execution判断查询状态：

queryExecution = client.get_query_execution(QueryExecutionId=queryStart['QueryExecutionId'])

您需要分析返回的对象以获得 QueryExecution.Status.State 字段的值。使用 get_query_execution() 继续更新对象，直到结果为 Succeeded.

注意：请不要连续循环调用get_query_execution()。相反，使用 exponential backoff algorithm 来防止被 API 限制。您应该对所有 API 调用使用这种方法。

然后您可以使用get_query_results()检索结果进行处理：

results = client.get_query_results(QueryExecutionId=queryStart['QueryExecutionId'])

Answer 3

您可以使用 boto3 客户端查询 Athena 表。

您可以在此处阅读更多相关信息：Simple way to query Amazon Athena in python with boto3

Answer 4

最简单的就是用awscrawler and its custom layer for aws lambda

import awswrangler as wr
sql = "select * from my_table"
df = wr.athena.read_sql_query(
    sql=sql, database="my_table", ctas_approach=True
)

如何在 python 中使用 lambda 函数在通过 S3 连接的 AWS athena 中查询

How to query in AWS athena connected through S3 using lambda functions in python

python

amazon-s3

amazon-web-services

boto3

amazon-athena