在 pyathenajdbc.connect() 中传递 AWS 会话令牌时,无法从 Python 2.7 查询 AWS Athena

Failing to querying AWS Athena from Python 2.7, when passing AWS session Token in pyathenajdbc.connect()

我正在尝试使用 pyathenajdbc.connect() 连接到 Athena。我通过多重身份验证设置了 AWS 凭证。当我不在连接字符串中包含 AWS 令牌时,出现以下错误。

athena_conn = connect(access_key=AWS_KEY_ID, secret_key=AWS_SECRET, s3_staging_dir='s3://abc-pqr-xyz/processed/athena-outputs/',region_name=REGION)

EROR: pyathenajdbc.error.DatabaseError: The security token included in the request is invalid. (Service: AmazonAthena; Status Code: 400; Error Code: UnrecognizedClientException; Request ID: 0d488c0b-1eed-11e7-bad8-711e54af6b73)

当我在连接字符串中包含 AWS 令牌时出现以下错误 -->

athena_conn = connect(access_key=AWS_KEY_ID, secret_key=AWS_SECRET, token=AWS_SESSION_TOKEN, s3_staging_dir='s3://abc-pqr-xyz/processed/athena-outputs/',region_name=REGION) ERROR: pyathenajdbc.error.DatabaseError: The security token included in the request is invalid. (Service: AmazonAthena; Status Code: 400; Error Code: UnrecognizedClientException; Request ID: 91751051-1eed-11e7-8347-153dfe3d84a6)

有谁知道这里出了什么问题吗??

这是我的全部代码。

from pyathenajdbc import connect
from pyathenajdbc.util import as_pandas
from boto3 import Session
import jpype
jvm_path = jpype.getDefaultJVMPath()

_current_credentials = Session().get_credentials()
AWS_KEY_ID = _current_credentials.access_key
AWS_SECRET = _current_credentials.secret_key
AWS_SESSION_TOKEN = _current_credentials.token
REGION = "us-east-2"

#athena_conn = connect(access_key=AWS_KEY_ID, secret_key=AWS_SECRET, s3_staging_dir='s3://abc-pqr-xyz/processed/athena-outputs/',region_name=REGION)

athena_conn = connect(access_key=AWS_KEY_ID, secret_key=AWS_SECRET, token=AWS_SESSION_TOKEN, s3_staging_dir='s3://abc-pqr-xyz/processed/athena-outputs/',region_name=REGION)

cursor = athena_conn.cursor();
query = 'SELECT * FROM xyz.ABC  limit 1;'
cursor.execute(query)
df = as_pandas(cursor)
print(df)

这个问题并不简单,但我猜它与您的凭据有关。您应该调查一下:尝试打印您的密钥并验证它们是否有效。

这是我用来输入凭据的替代方法:

import configparser    

aws_config_file = '~/.aws/config'

Config = configparser.ConfigParser()
Config.read(os.path.expanduser(aws_config_file))

access_key_id = Config['default']['aws_access_key_id']
secret_key_id = Config['default']['aws_secret_access_key']

否则,只是为了确保问题与 jdbc 驱动程序无关,粘贴以下命令的输出

import pyathenajdbc 

print(pyathenajdbc.ATHENA_CONNECTION_STRING)
print(pyathenajdbc.ATHENA_DRIVER_CLASS_NAME)
print(pyathenajdbc.ATHENA_DRIVER_DOWNLOAD_URL)
print(pyathenajdbc.ATHENA_JAR)
from pyathenajdbc import connect
from pyathenajdbc.util import as_pandas
from boto3 import Session
import os

_current_credentials = Session().get_credentials()

os.environ['AWS_ACCESS_KEY_ID'] = _current_credentials.access_key
os.environ['AWS_SECRET_ACCESS_KEY'] = _current_credentials.secret_key
os.environ['AWS_SESSION_TOKEN'] = _current_credentials.token


athena_conn = connect(s3_staging_dir='s3://your-bucket/',
           region_name='us-west-2',
           aws_credentials_provider_class='com.amazonaws.athena.jdbc.shaded.com.amazonaws.auth.EnvironmentVariableCredentialsProvider')

cursor = athena_conn.cursor();
query = 'SELECT * FROM schema.table_name limit 1;'
cursor.execute(query)
df = as_pandas(cursor)
print(df)

假设您在 ~/.aws 文件夹下有一个定义了区域的配置文件,您可以使用 Session()。region_name

以下工作正常(不必导入 OS):

from pyathenajdbc import connect
from pyathenajdbc.util import as_pandas
from boto3 import Session
import jpype
jvm_path = jpype.getDefaultJVMPath()

_current_credentials = Session().get_credentials()
AWS_KEY_ID = _current_credentials.access_key
AWS_SECRET = _current_credentials.secret_key
REGION = Session().region_name

athena_conn = connect(access_key=AWS_KEY_ID,
               secret_key=AWS_SECRET,
               s3_staging_dir='path_to_staging_dir',
               region_name=REGION)

cursor = athena_conn.cursor();

query = 'SELECT current_date;'

cursor.execute(query)
df = as_pandas(cursor)
print(df)