Amazon Athena 的 S3 数据位置

Location of S3 data for Amazon Athena

我创建了一个 Amazon S3 存储桶并上传了一个平面文件(著名的 Iris flower data set 数据为 csv)。

我现在想在 Amazon Athena 中创建一个 Iris 数据集平面 table 并查询它。我就是找不到 'Location of Input Data Set'.

如何确定平面 Iris 文件在 S3 存储桶中的位置?是否有针对上述情况的教程(google 还没有太大帮助)?

如果您安装了 AWS CLI,则可以使用它来查找文件:

aws s3 ls s3://bucket_name --recursive | grep iris_csv_file

根据 Amazon Athena CREATE TABLE documentation,创建 table 的语法是:

CREATE [EXTERNAL] TABLE [IF NOT EXISTS]
 [db_name.]table_name [(col_name data_type [COMMENT col_comment] [, ...] )]
 [COMMENT table_comment]
 [PARTITIONED BY (col_name data_type [COMMENT col_comment], ...)]
 [ROW FORMAT row_format]
 [STORED AS file_format] [WITH SERDEPROPERTIES (...)] ]
 [LOCATION 's3_loc']
 [TBLPROPERTIES ( ['has_encrypted_data'='true | false',] ['classification'='aws_glue_classification',] property_name=property_value [, ...] ) ]

s3_loc是:

Specifies the location of the underlying data in Amazon S3 from which the table is created, for example, s3://mystorage/. For more information about considerations such as data format and permissions, see Create Tables From Underlying Data in Amazon S3.

Use a trailing slash for your folder or bucket. Do not use file names or glob characters.

Use: s3://mybucket/myfolder/

Don't use: s3://path_to_bucket s3://path_to_bucket/* s3://path_to-bucket/mydatafile.dat

因此,如果您将平面文件存储在名为 iris 的目录中名为 my-bucket 的存储桶中,您将使用:

LOCATION s3://my-bucket/iris/

请注意,您指向的是目录,而不是文件。这是因为许多数据集被存储为多个文件(甚至多个子目录)。