Athena 数组根据条件聚合和过滤多列

Question

我有如下数据。

uuid	movie	data
1	movie1	{title=rental, label=GA, price=50, feetype=rental, hidden=false}
1	movie1	{title=tax, label=GA, price=25, feetype=service-fees, hidden=true}
1	movie1	{title=rental, label=GA, price=50, feetype=rental, hidden=false}
1	movie1	{title=tax, label=GA, price=25, feetype=service-fees, hidden=true}
2	movie3	{title=rental, label=VIP, price=100, feetype=rental, hidden=false}
2	movie3	{title=tax, label=VIP, price=25, feetype=service-fees, hidden=true}
2	movie3	{title=promo, label=VIP, price=10, feetype=discount, hidden=false}

而且，这就是我想要的结果。

uuid	total_fee	total_discount	discount_type
1	150	0	NA
2	125	10	promo

我试过使用

SELECT uuid
   , sum("fee"."price") "total_fee"   
   , array_agg(distinct("fee"."feetype")) "fee_type"
   , array_agg(distinct("fee"."title")) "fee_name"

这给出了如下所示的结果，

uuid	total_fee	fee_type	fee_name
1	100	[rental]	[rental]
1	50	[service-fees]	[tax]
2	100	[rental]	[rental]
2	25	[service-fees]	[tax]
2	10	[discount]	[promo]

现在如何聚合 total_fee 并根据 fee_type 筛选 fee_name？

我试过使用

, CASE WHEN regexp_like(array_join(fee_type, ','), 'discount') THEN sum("fee") ELSE 0  END "discount"

但这导致

SYNTAX_ERROR: line 207:6: '(CASE WHEN "regexp_like"("array_join"(fee_type, ','), 'discount') THEN "sum"("fee") ELSE 0 END)' must be an aggregate expression or appear in GROUP BY clause

Answer 1

你应该可以这样做：

SELECT
  uuid,
  SUM(fee.price) AS total_fee,
  SUM(fee.price) FILTER (WHERE fee.feetype = 'discount') AS total_discount,
  ARBITRARY(fee.title) FILTER (WHERE fee.feetype = 'discount') AS discount_type
FROM …
GROUP BY uuid

（我假设您示例中的 data 列与查询中的 fee 列相同）。

聚合函数支持 FILTER 子句，该子句选择要包含在聚合中的行。这也可以通过例如SUM(IF(fee.feetype = 'discount', fee.price, 0))，更紧凑但不够优雅。

ARBITRARY 聚合函数从组中选取一个任意值。我不知道这是否适合您的情况，但我假设每组只有一个折扣行。如果有多个，您可能希望将 ARRAY_AGG 与 DISTINCT 子句一起使用（例如 ARRAY_AGG(DISTINCT fee.title) 以获得全部）。

Athena 数组根据条件聚合和过滤多列

Athena array aggregate and filter multiple columns on condition

sql

presto

amazon-athena