如何在配置单元查询中编写 case 和 group by
how to write case and group by in hive query
这是我的蜂巢table:
course dept subject status
btech cse java pass
btech cse hadoop fail
btech cse cg detained
btech cse cc pass
btech it daa pass
btech it wt pass
btech it cnn pass
mba hr hrlaw pass
mba hr hrguid absent
mtech cs java pass
mtech cs cd pass
mtech cs cp detained
我想查询此 table 以通过以下方式检索数据:
course dept status
btech cse fail
btech it pass
mba hr absent
mtech cs fail
首先,它会检查每个 dept
和 course
组合在一起的 status
中的 "fail" 或 "detained"。如果找到 "fail" 或 "detained",它将输出 "fail" 作为 status
。否则,如果在同一组中发现 "absent",它将输出 "absent" 作为 status
。否则,它将输出 "pass".
我在 运行 以下查询时收到一条错误消息:
select course,dept,
case
when status in ( 'fail','detained') then 'fail'
when status in ( 'absent') then 'absent'
when status in ( 'pass') then 'pass'
else null
end as Final_Status
from college
group by course,dept;
问题是,group by所需的列需要放在最后。在修改后的查询下方,它现在应该可以工作了。
select
case
when status in ( 'fail','detained') then 'FAILED'
when status in ( 'absent') then 'absent'
when status in ( 'pass') then 'PASSED'
else null
end as Final_Status,course,dept
from college
group by course,dept;
当你按课程和部门分组时,你会得到状态列的多个值(针对不同的记录),这需要处理。
select 中的任何列是不是 group by 的一部分应该在聚合函数中
这是一个使用 sum() 函数的解决方案。
select course, dept,
case when sum(case when status in ( 'fail','detained') then 1 else 0 end) > 0 then 'fail'
when sum(case when status in ('absent') then 1 else 0 end) > 0 then 'absent'
when sum(case when status in ('pass') then 1 else 0 end) > 0 then 'pass'
else 'no_result'
end as final_status
from college
group by
course,dept
试试这个。
select course,dept,
collect_set(
case
when status in ( 'fail','detained') then 'FAILED'
when status in ( 'absent') then 'absent'
when status in ( 'pass') then 'PASSED'
else null
end ) as Final_Status
from college
group by course,dept;
如果我没理解错的话,你想要这样的东西:
select course,dept,
case
when status in ( 'fail','detained') then 'FAILED'
when status in ( 'absent') then 'absent'
when status in ( 'pass') then 'PASSED'
else null
end as Final_Status
from college
group by course,dept,
CASE when status in ( 'fail','detained') then 'FAILED'
when status in ( 'absent') then 'absent'
when status in ( 'pass') then 'PASSED'
else null END;
我在 GROUP 中使用 CASE,它与 Hive 配合得很好。
这是我的蜂巢table:
course dept subject status
btech cse java pass
btech cse hadoop fail
btech cse cg detained
btech cse cc pass
btech it daa pass
btech it wt pass
btech it cnn pass
mba hr hrlaw pass
mba hr hrguid absent
mtech cs java pass
mtech cs cd pass
mtech cs cp detained
我想查询此 table 以通过以下方式检索数据:
course dept status
btech cse fail
btech it pass
mba hr absent
mtech cs fail
首先,它会检查每个 dept
和 course
组合在一起的 status
中的 "fail" 或 "detained"。如果找到 "fail" 或 "detained",它将输出 "fail" 作为 status
。否则,如果在同一组中发现 "absent",它将输出 "absent" 作为 status
。否则,它将输出 "pass".
我在 运行 以下查询时收到一条错误消息:
select course,dept,
case
when status in ( 'fail','detained') then 'fail'
when status in ( 'absent') then 'absent'
when status in ( 'pass') then 'pass'
else null
end as Final_Status
from college
group by course,dept;
问题是,group by所需的列需要放在最后。在修改后的查询下方,它现在应该可以工作了。
select
case
when status in ( 'fail','detained') then 'FAILED'
when status in ( 'absent') then 'absent'
when status in ( 'pass') then 'PASSED'
else null
end as Final_Status,course,dept
from college
group by course,dept;
当你按课程和部门分组时,你会得到状态列的多个值(针对不同的记录),这需要处理。
select 中的任何列是不是 group by 的一部分应该在聚合函数中
这是一个使用 sum() 函数的解决方案。
select course, dept,
case when sum(case when status in ( 'fail','detained') then 1 else 0 end) > 0 then 'fail'
when sum(case when status in ('absent') then 1 else 0 end) > 0 then 'absent'
when sum(case when status in ('pass') then 1 else 0 end) > 0 then 'pass'
else 'no_result'
end as final_status
from college
group by
course,dept
试试这个。
select course,dept,
collect_set(
case
when status in ( 'fail','detained') then 'FAILED'
when status in ( 'absent') then 'absent'
when status in ( 'pass') then 'PASSED'
else null
end ) as Final_Status
from college
group by course,dept;
如果我没理解错的话,你想要这样的东西:
select course,dept,
case
when status in ( 'fail','detained') then 'FAILED'
when status in ( 'absent') then 'absent'
when status in ( 'pass') then 'PASSED'
else null
end as Final_Status
from college
group by course,dept,
CASE when status in ( 'fail','detained') then 'FAILED'
when status in ( 'absent') then 'absent'
when status in ( 'pass') then 'PASSED'
else null END;
我在 GROUP 中使用 CASE,它与 Hive 配合得很好。