Hive 中的高效子查询
Efficient subqueries in Hive
我有table-
Employee Dept Visited
1 a yes
1 yes
1 yes
2 b
1 b yes
2 yes
3 ab
4 ac yes
5 yes
5 yes
6 fe
6
7 ad yes
2 ad yes
3 a yes
3 c
6 yes
7
8 a yes
8 yes
9 fe yes
*
I need to find all employee who Do not have Null values for 2 Depts with
visited = yes
*
我尝试在 Hive 中编写查询并遵循 -
select c.Employee
from table c
where c.Employee NOT IN (select d.Employee from table d where Visited = 'Yes' and Dept = '' group by d.Employee having count(d.Employee) >=2)
;
它有效,但此查询需要大量时间,所以我相信它可以做得更好。
任何建议
我建议使用 having
和 group by
:
select c.Employee
from table c
group by c.Employee
having sum(case when c.dept is null and c.visited = 'Yes' then 1 else 0 end) < 2;
我有table-
Employee Dept Visited
1 a yes
1 yes
1 yes
2 b
1 b yes
2 yes
3 ab
4 ac yes
5 yes
5 yes
6 fe
6
7 ad yes
2 ad yes
3 a yes
3 c
6 yes
7
8 a yes
8 yes
9 fe yes
*
I need to find all employee who Do not have Null values for 2 Depts with visited = yes
*
我尝试在 Hive 中编写查询并遵循 -
select c.Employee
from table c
where c.Employee NOT IN (select d.Employee from table d where Visited = 'Yes' and Dept = '' group by d.Employee having count(d.Employee) >=2)
;
它有效,但此查询需要大量时间,所以我相信它可以做得更好。 任何建议
我建议使用 having
和 group by
:
select c.Employee
from table c
group by c.Employee
having sum(case when c.dept is null and c.visited = 'Yes' then 1 else 0 end) < 2;