在同一个 Cloudformation 堆栈中连接 Athena 和 S3
Connecting Athena and S3 in same Cloudformation Stack
根据文档 AWS::Athena::NamedQuery,不清楚如何将 Athena 附加到同一堆栈中指定的 S3 存储桶。
如果我不得不根据 example 进行猜测,我想您可以编写一个模板,例如
Resources:
MyS3Bucket:
Type: AWS::S3::Bucket
... other params ...
AthenaNamedQuery:
Type: AWS::Athena::NamedQuery
Properties:
Database: "db_name"
Name: "MostExpensiveWorkflow"
QueryString: >
CREATE EXTERNAL TABLE db_name.test_table
(...) LOCATION s3://.../path/to/folder/
像上面这样的模板行得通吗?创建堆栈后,table db_name.test_table
是否可用于 运行 查询?
原来你连接 S3 和 Athena 的方式是做一个 Glue table!我真傻!!当然,胶水是连接事物的方式!
撇开讽刺不谈,这是一个在我使用 AWS::Glue::Table and AWS::Glue::Database、
时对我有用的模板
Resources:
MyS3Bucket:
Type: AWS::S3::Bucket
MyGlueDatabase:
Type: AWS::Glue::Database
Properties:
DatabaseInput:
Name: my-glue-database
Description: "Glue beats tape"
CatalogId: !Ref AWS::AccountId
MyGlueTable:
Type: AWS::Glue::Table
Properties:
DatabaseName: !Ref MyGlueDatabase
CatalogId: !Ref AWS::AccountId
TableInput:
Name: my-glue-table
Parameters: { "classification" : "csv" }
StorageDescriptor:
Location:
Fn::Sub: "s3://${MyS3Bucket}/"
InputFormat: "org.apache.hadoop.mapred.TextInputFormat"
OutputFormat: "org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat"
SerdeInfo:
Parameters: { "separatorChar" : "," }
SerializationLibrary: "org.apache.hadoop.hive.serde2.OpenCSVSerde"
StoredAsSubDirectories: false
Columns:
- Name: column0
Type: string
- Name: column1
Type: string
在此之后,数据库和 table 位于 AWS Athena 控制台中!
根据文档 AWS::Athena::NamedQuery,不清楚如何将 Athena 附加到同一堆栈中指定的 S3 存储桶。
如果我不得不根据 example 进行猜测,我想您可以编写一个模板,例如
Resources:
MyS3Bucket:
Type: AWS::S3::Bucket
... other params ...
AthenaNamedQuery:
Type: AWS::Athena::NamedQuery
Properties:
Database: "db_name"
Name: "MostExpensiveWorkflow"
QueryString: >
CREATE EXTERNAL TABLE db_name.test_table
(...) LOCATION s3://.../path/to/folder/
像上面这样的模板行得通吗?创建堆栈后,table db_name.test_table
是否可用于 运行 查询?
原来你连接 S3 和 Athena 的方式是做一个 Glue table!我真傻!!当然,胶水是连接事物的方式!
撇开讽刺不谈,这是一个在我使用 AWS::Glue::Table and AWS::Glue::Database、
时对我有用的模板Resources:
MyS3Bucket:
Type: AWS::S3::Bucket
MyGlueDatabase:
Type: AWS::Glue::Database
Properties:
DatabaseInput:
Name: my-glue-database
Description: "Glue beats tape"
CatalogId: !Ref AWS::AccountId
MyGlueTable:
Type: AWS::Glue::Table
Properties:
DatabaseName: !Ref MyGlueDatabase
CatalogId: !Ref AWS::AccountId
TableInput:
Name: my-glue-table
Parameters: { "classification" : "csv" }
StorageDescriptor:
Location:
Fn::Sub: "s3://${MyS3Bucket}/"
InputFormat: "org.apache.hadoop.mapred.TextInputFormat"
OutputFormat: "org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat"
SerdeInfo:
Parameters: { "separatorChar" : "," }
SerializationLibrary: "org.apache.hadoop.hive.serde2.OpenCSVSerde"
StoredAsSubDirectories: false
Columns:
- Name: column0
Type: string
- Name: column1
Type: string
在此之后,数据库和 table 位于 AWS Athena 控制台中!