通过 Athena SDK 创建 Glue 数据目录

Question

我想使用 Athena 运行查询另一个 AWS 账户中 S3 存储桶中的数据。我正在使用 Javascript SDK。通读 documentation，我明白我必须首先创建一个数据目录，将 Athena 指向正确的 S3 位置。

我认为我必须调用 createDataCatalog 方法。这个方法的大部分参数都是不言自明的，除了“parameters”参数，它似乎包含有关如何创建数据目录的信息。但是我无法在任何地方找到这些参数的外观。

所以我的问题是：

这里要提供什么参数？
这是创建粘合数据目录（包括数据库和table）的正确方法吗？
完成后，我可以运行 Athena 查询数据目录吗？

Answer 1

对于静态 S3 数据的简单用例，

我们首先需要使用 Glue createTable API pointing to S3 location. Few Examples in cli documentation 创建 Glue Table。
运行从 Athena

这是创建 Glue 数据库的示例，Table

const AWS = require("aws-sdk");
AWS.config.update({ region: "us-east-1" });

const glue = new AWS.Glue();
const dbName = "test-db";
glue.createDatabase(
  {
    DatabaseInput: {
      Name: dbName,
    },
  },
  function (dbCrtErr, dbRsp) {
    if (dbCrtErr.message === "Database already exists." || dbRsp) {
      console.log("dbRsp", dbRsp);
      glue.createTable(
        {
          DatabaseName: dbName,
          TableInput: {
            Name: "my-table",
            Parameters: {
              classification: "json",
              compressionType: "none",
            },
            TableType: "EXTERNAL_TABLE",
            StorageDescriptor: {
              Location: "s3://my-s3-bucket-with-events/",
              InputFormat:
                "org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat",
              OutputFormat:
                "org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat",
              Columns: [
                {
                  Name: "id",
                  Type: "string",
                },
                {
                  Name: "name",
                  Type: "string",
                },
              ],
            },
          },
        },
        function (error, response) {
          console.log("error", error, "response", response);
        }
      );
    } else {
      console.log("dbCrtErr", dbCrtErr);
    }
  }
);

通过 Athena SDK 创建 Glue 数据目录

Create Glue data catalog via Athena SDK

amazon-web-services

amazon-athena

aws-sdk-js

aws-glue-data-catalog