在同一个 Cloudformation 堆栈中连接 Athena 和 S3

Question

根据文档 AWS::Athena::NamedQuery，不清楚如何将 Athena 附加到同一堆栈中指定的 S3 存储桶。

如果我不得不根据 example 进行猜测，我想您可以编写一个模板，例如

Resources:
  MyS3Bucket:
    Type: AWS::S3::Bucket
       ... other params ...

  AthenaNamedQuery:
    Type: AWS::Athena::NamedQuery
    Properties:
      Database: "db_name"
      Name: "MostExpensiveWorkflow"
      QueryString: >
                    CREATE EXTERNAL TABLE db_name.test_table 
                    (...) LOCATION s3://.../path/to/folder/

像上面这样的模板行得通吗？创建堆栈后，table db_name.test_table 是否可用于运行查询？

Answer 1

原来你连接 S3 和 Athena 的方式是做一个 Glue table！我真傻！！当然，胶水是连接事物的方式！

撇开讽刺不谈，这是一个在我使用 AWS::Glue::Table and AWS::Glue::Database、

时对我有用的模板

Resources:
  MyS3Bucket:
    Type: AWS::S3::Bucket

  MyGlueDatabase:
    Type: AWS::Glue::Database
    Properties:
      DatabaseInput:
        Name: my-glue-database
        Description: "Glue beats tape"
      CatalogId: !Ref AWS::AccountId

  MyGlueTable:
    Type: AWS::Glue::Table
    Properties:
      DatabaseName: !Ref MyGlueDatabase
      CatalogId: !Ref AWS::AccountId
      TableInput:
        Name: my-glue-table
        Parameters: { "classification" : "csv" }
        StorageDescriptor:
          Location:
            Fn::Sub: "s3://${MyS3Bucket}/"
          InputFormat: "org.apache.hadoop.mapred.TextInputFormat"
          OutputFormat: "org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat"
          SerdeInfo:
            Parameters: { "separatorChar" : "," }
            SerializationLibrary: "org.apache.hadoop.hive.serde2.OpenCSVSerde"
          StoredAsSubDirectories: false
          Columns:
            - Name: column0
              Type: string
            - Name: column1
              Type: string

在此之后，数据库和 table 位于 AWS Athena 控制台中！

在同一个 Cloudformation 堆栈中连接 Athena 和 S3

Connecting Athena and S3 in same Cloudformation Stack

amazon-s3

amazon-web-services

amazon-cloudformation

amazon-athena