Hive 在使用 case 语句和聚合时按列分组时出错

Question

我正在处理配置单元中的查询。因为我正在使用诸如 sum 和 case 语句以及 group by 子句之类的聚合。我已经更改了列名称和 table 名称，但我的逻辑与我在项目中使用的逻辑相同

select 
empname,
empsal, 
emphike,
sum(empsal) as tot_sal,
sum(emphike) as tot_hike,
case when tot_sal > 1000 then exp(tot_hike)
else 0
end as manager
from employee
group by 
empname,
empsal,
emphike

对于上述查询，我得到的错误是 "Expression not in group by key '1000'"。所以我稍微修改了查询并再次尝试我的另一个查询是

select 
empname,
empsal, 
emphike,
sum(empsal) as tot_sal,
sum(emphike) as tot_hike,
case when sum(empsal) > 1000 then exp(sum(emphike))
else 0
end as manager
from employee
group by 
empname,
empsal,
emphike

对于上面的查询，它把我的错误设置为 "Expression not in group by key 'Manager'"。当我通过显示无效别名在组中添加经理时。请帮帮我

Answer 1

我发现您的查询存在三个问题：

1.) Hive 无法按您在 select 块中定义的变量按您立即给它的名称进行分组。您可能需要一个子查询。

2.) 当 sum 或 count 操作不在查询末尾时，Hive 往往会显示错误。

3.) 虽然我不知道您的目标是什么，但我认为您的查询不会提供所需的结果。如果按 empsal 分组，empsal 和 sum(empsal) 之间的设计没有区别。 emphike 和 sum(emphike).

也是如此

我认为以下查询可能会解决这些问题：

select
a.empname,
a.tot_sal, 
a.tot_hike,
if(a.tot_sal > 1000, exp(a.tot_hike), 0) as manager
from
(select 
empname,
sum(empsal) as tot_sal,
sum(emphike) as tot_hike,
from employee
group by 
empname
)a

if 语句等同于您的 case 语句，但我发现它更容易阅读。

在此示例中，您不需要在子查询之后进行分组，因为分组是在子查询中完成的 a。

Hive 在使用 case 语句和聚合时按列分组时出错

Hive Getting error on group by column while using case statements and aggregations

hadoop

hive

hiveql

hadoop2