更新 Glue Table Schema 时设置小数位数

Question

我正在尝试更新由 Glue Data Crawler 创建的 CSV table 定义。其中一列包含当前被归类为双精度的十进制数据。

我发现当我使用控制台更改架构时，我无法设置任何可能与数据类型关联的附加属性（例如，如果我 select Decimal 我得到Decimal(10,0) 无法更改数字的大小或小数位数。

更新此架构以使其具有正确的数据类型（包括其他属性）的推荐方法是什么？

Answer 1

我最近在 Glue Table 架构上设置小数点时遇到了一些问题。我必须通过 AWS cli 创建我的架构。

我的有点不同，它是我的 s3 数据湖上的镶木地板。

以下 cli 命令基于 json 创建架构：

aws glue create-table --database-name example_db --table-input file://example.json

以下 example.json 引用了 s3://my-datalake/example/{dt}/ 上的镶木地板文件，其中 dt 是我的 table 的一个分区。 dec_col 是 decimal(10,2) 类型的列：

{
    "Name": "example",
    "Retention": 0,
    "StorageDescriptor": {
        "Columns": [
          {
            "Name": "id",
            "Type": "int"
        },
        {
            "Name": "dec_col",
            "Type": "decimal(10,2)"
        }
        ],
        "Location": "s3://my-datalake/example/",
        "InputFormat": "org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat",
        "OutputFormat": "org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat",
        "Compressed": false,
        "NumberOfBuckets": 0,
        "SerdeInfo": {
            "SerializationLibrary": "org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe",
            "Parameters": {
                "serialization.format": "1"
            }
        },
        "SortColumns": [],
        "StoredAsSubDirectories": false
    },
    "PartitionKeys": [
        {
            "Name": "dt",
            "Type": "date"
        }
    ],
    "TableType": "EXTERNAL_TABLE",
    "Parameters": {
        "classification": "parquet"
    }
}

通过这种方式，您可以将类型定义为 decimal，具有规模和精度，这正是您要寻找的。

更新 Glue Table Schema 时设置小数位数

Setting the number of decimal places when updating Glue Table Schema

amazon-web-services

aws-glue

aws-glue-data-catalog