在 athena 查询 boto3 之后,将文件从一个 s3 位置复制到另一个位置不起作用
Copy of file from one s3 location to another does not work after athena query boto3
我正在使用 boto3 从 athena 查询结果。他们工作正常。
我再次使用 boto3 将文件从一个 s3 存储桶复制到另一个存储桶,但它说无法找到该文件。我无法找到解决方案。请帮忙!
当我转到 s3 控制台时,我可以看到该文件,但 boto3 找不到它。
import boto3
athena = boto3.client('athena')
s3 = boto3.resource('s3')
BUCKET_NAME = 'bucket1'
bucket = s3.Bucket(BUCKET_NAME)
query = 'SELECT * FROM "db"."table" limit 2'
response = athena.start_query_execution(QueryString=query, QueryExecutionContext={
'Database': 'db'
}, ResultConfiguration={
'OutputLocation': 's3://bucket1/',
})
key = response['QueryExecutionId'] + '.csv'
copy_source = {
'Bucket': 'bucket1',
'Key': key
}
s3.meta.client.copy(copy_source, 'bucket2', 'main.csv')
错误是:-
Traceback (most recent call last):
File "/Users/tanmaysinghal/Vizualization/Python Scripts/test.py", line 23, in <module>
s3.meta.client.copy(copy_source, 'bucket2', 'main.csv')
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/boto3/s3/inject.py", line 379, in copy
return future.result()
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/s3transfer/futures.py", line 106, in result
return self._coordinator.result()
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/s3transfer/futures.py", line 265, in result
raise self._exception
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/s3transfer/tasks.py", line 255, in _main
self._submit(transfer_future=transfer_future, **kwargs)
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/s3transfer/copies.py", line 110, in _submit
**head_object_request)
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/botocore/client.py", line 357, in _api_call
return self._make_api_call(operation_name, kwargs)
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/botocore/client.py", line 661, in _make_api_call
raise error_class(parsed_response, operation_name)
botocore.exceptions.ClientError: An error occurred (404) when calling the HeadObject operation: Not Found
您的代码存在根本问题,即您试图在查询完成之前复制 Athena
S3
output
文件。
当您在控制台中看到它时,您需要几秒钟才能看到,到那时文件已准备就绪。
您必须等待 Athena
完成查询并将 output
写入 S3
。
Athena 的工作原理如下:
- 提交查询。
- 检查您的查询
status
。
- 如果状态为
running
,等待并转到第2步。如果SUCCEEDED
,转到第4步。否则失败,然后执行更正操作。
- 读取输出
S3 file
。
这是工作代码。
import boto3
import time
athena = boto3.client('athena')
# s3 = boto3.resource('s3')
# BUCKET_NAME = 'bucket1'
# bucket = s3.Bucket(BUCKET_NAME)
query = 'SELECT * FROM your-database.your-table limit 10'
response = athena.start_query_execution(QueryString=query, QueryExecutionContext={
'Database': 'your-database'
}, ResultConfiguration={
'OutputLocation': 's3://your-s3-output-bucket',
})
execution_id = response['QueryExecutionId']
key = execution_id + ".csv"
state = 'RUNNING';
##waiting for query to complete and then read the result.
while (state in ['RUNNING']):
response = athena.get_query_execution(QueryExecutionId=execution_id)
if 'QueryExecution' in response and \
'Status' in response['QueryExecution'] and \
'State' in response['QueryExecution']['Status']:
state = response['QueryExecution']['Status']['State']
if state == 'FAILED':
print("FAILED")
elif state == 'SUCCEEDED':
s3_path = response['QueryExecution']['ResultConfiguration']['OutputLocation']
print("S3-Path:" + s3_path)
time.sleep(1)
# If state is succeeded, meaning query has completed successfully, now, you could read the output file or try copy it to somewhere else.
if state == 'SUCCEEDED':
copy_source = {
'Bucket': 'your-s3-output-bucket',
'Key': key
}
s3.meta.client.copy(copy_source, 'bucket2', 'main.csv')
希望对您有所帮助!
我正在使用 boto3 从 athena 查询结果。他们工作正常。 我再次使用 boto3 将文件从一个 s3 存储桶复制到另一个存储桶,但它说无法找到该文件。我无法找到解决方案。请帮忙!
当我转到 s3 控制台时,我可以看到该文件,但 boto3 找不到它。
import boto3
athena = boto3.client('athena')
s3 = boto3.resource('s3')
BUCKET_NAME = 'bucket1'
bucket = s3.Bucket(BUCKET_NAME)
query = 'SELECT * FROM "db"."table" limit 2'
response = athena.start_query_execution(QueryString=query, QueryExecutionContext={
'Database': 'db'
}, ResultConfiguration={
'OutputLocation': 's3://bucket1/',
})
key = response['QueryExecutionId'] + '.csv'
copy_source = {
'Bucket': 'bucket1',
'Key': key
}
s3.meta.client.copy(copy_source, 'bucket2', 'main.csv')
错误是:-
Traceback (most recent call last):
File "/Users/tanmaysinghal/Vizualization/Python Scripts/test.py", line 23, in <module>
s3.meta.client.copy(copy_source, 'bucket2', 'main.csv')
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/boto3/s3/inject.py", line 379, in copy
return future.result()
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/s3transfer/futures.py", line 106, in result
return self._coordinator.result()
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/s3transfer/futures.py", line 265, in result
raise self._exception
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/s3transfer/tasks.py", line 255, in _main
self._submit(transfer_future=transfer_future, **kwargs)
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/s3transfer/copies.py", line 110, in _submit
**head_object_request)
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/botocore/client.py", line 357, in _api_call
return self._make_api_call(operation_name, kwargs)
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/botocore/client.py", line 661, in _make_api_call
raise error_class(parsed_response, operation_name)
botocore.exceptions.ClientError: An error occurred (404) when calling the HeadObject operation: Not Found
您的代码存在根本问题,即您试图在查询完成之前复制 Athena
S3
output
文件。
当您在控制台中看到它时,您需要几秒钟才能看到,到那时文件已准备就绪。
您必须等待 Athena
完成查询并将 output
写入 S3
。
Athena 的工作原理如下:
- 提交查询。
- 检查您的查询
status
。 - 如果状态为
running
,等待并转到第2步。如果SUCCEEDED
,转到第4步。否则失败,然后执行更正操作。 - 读取输出
S3 file
。
这是工作代码。
import boto3
import time
athena = boto3.client('athena')
# s3 = boto3.resource('s3')
# BUCKET_NAME = 'bucket1'
# bucket = s3.Bucket(BUCKET_NAME)
query = 'SELECT * FROM your-database.your-table limit 10'
response = athena.start_query_execution(QueryString=query, QueryExecutionContext={
'Database': 'your-database'
}, ResultConfiguration={
'OutputLocation': 's3://your-s3-output-bucket',
})
execution_id = response['QueryExecutionId']
key = execution_id + ".csv"
state = 'RUNNING';
##waiting for query to complete and then read the result.
while (state in ['RUNNING']):
response = athena.get_query_execution(QueryExecutionId=execution_id)
if 'QueryExecution' in response and \
'Status' in response['QueryExecution'] and \
'State' in response['QueryExecution']['Status']:
state = response['QueryExecution']['Status']['State']
if state == 'FAILED':
print("FAILED")
elif state == 'SUCCEEDED':
s3_path = response['QueryExecution']['ResultConfiguration']['OutputLocation']
print("S3-Path:" + s3_path)
time.sleep(1)
# If state is succeeded, meaning query has completed successfully, now, you could read the output file or try copy it to somewhere else.
if state == 'SUCCEEDED':
copy_source = {
'Bucket': 'your-s3-output-bucket',
'Key': key
}
s3.meta.client.copy(copy_source, 'bucket2', 'main.csv')
希望对您有所帮助!