雅典娜 return 中的分区 json 查询没有数据

Question

一段时间以来我一直在尝试设置 athena 数据库，我似乎正确设置了数据库，但是当我查询它时 returns 没有数据。我查询的数据是

结构的一系列分区S3文件

"S3://bucket_name/data1=partition_1/data2=partition_2/data3=partition_3/data4=partition_4/file.json"

一个分区可以有多个file.json例如

"S3://bucket_name/data1=partition_1/data2=partition_2/data3=partition_3/data4=partition_4/file1.json"
"S3://bucket_name/data1=partition_1/data2=partition_2/data3=partition_3/data4=partition_4/file2.json"

下面是我运行的查询以及创建命令和存储的数据

CREATE EXTERNAL TABLE bench_logs (
  id string,
  filename string,
  data struct<transmit_start: timestamp, 
                     transmit_end:timestamp, 
                     transfer_start:timestamp,
                     transfer_end:timestamp,
                     processing_start:timestamp, 
                     processing_end:timestamp>
  )
PARTITIONED BY ( 
  partition_1 string, 
  partition_2 string, 
  partition_3 date, 
  partition_4 string
  )
ROW FORMAT SERDE 
  'org.openx.data.jsonserde.JsonSerDe'
WITH SERDEPROPERTIES ('ignore.malformed.json' = 'true')
LOCATION
  's3://benchmark-files/complete/'
TBLPROPERTIES (
  'classification'='json',
  'storage.location.template'='s3://iceqube-benchmark-files/complete/partition_1=${partition_1}/partition_2=${partition_2}/partition_3=${partition_3}/partition_4=${partition_4}/')

table 被查询为：

SELECT id FROM "benchmark"."bench_logs"
WHERE partition_1='foo'
AND partition_2='bar' 
AND partition_3=cast('1970-01-01' as date) 
AND partition_4='09:30:00';

查询运行正确，但除了 headers.

列，我看不到任何数据

如果还需要数据，请提供，我已经被困了好几天了，根本无法理解它。提前致谢。

Answer 1

在您可以查询分区之前 table you must add the partitions to it. This can be done with ALTER TABLE bench_logs ADD PARTITION …, or by using partition projection, as well as other ways.

此外，您似乎混淆了 Hive 分区方案的键和值：如果分区键称为 partition_1，则 S3 URI 应为 …/partition_1=data_1/…、不是 …/data_1=partition_1/….

Answer 2

回来以防其他人遇到这些问题。

我听从了 Theo 的建议，但仍然无法查询。

原来问题是分区的值包含“:”，因为这通常是 S3 中的受限字符，但当我以编程方式编写时，它被允许通过。

对此的完整解释得到更好的回答here

雅典娜 return 中的分区 json 查询没有数据

partitioned json queries in athena return no data

json

amazon-s3

amazon-web-services

amazon-athena