Hive 中的高效子查询

Question

我有table-

Employee    Dept    Visited
1   a   yes
1       yes
1       yes
2   b
1   b   yes
2       yes
3   ab
4   ac  yes
5       yes
5       yes
6   fe
6   
7   ad  yes
2   ad  yes
3   a   yes
3   c
6       yes
7   
8   a   yes
8       yes
9   fe  yes

*

I need to find all employee who Do not have Null values for 2 Depts with visited = yes

*

我尝试在 Hive 中编写查询并遵循 -

select c.Employee 
from table c
where c.Employee NOT IN (select d.Employee from table d where Visited = 'Yes' and Dept = '' group by d.Employee having count(d.Employee) >=2)
;

它有效，但此查询需要大量时间，所以我相信它可以做得更好。任何建议

Answer 1

我建议使用 having 和 group by:

select c.Employee
from table c
group by c.Employee
having sum(case when c.dept is null and c.visited = 'Yes' then 1 else 0 end) < 2;

Hive 中的高效子查询

Efficient subqueries in Hive

mysql

hadoop

hive

hiveql