对此的最佳 Hive SQL 查询

Best Hive SQL query for this

我有 2 个 table 这样的东西。我是 运行 一个配置单元查询,windows 功能在配置单元中似乎非常有限。

Table部门

id | name |
1 | a |
2 | b |
3 | c |
4 | d |

Table 时间(使用重负载查询构建,因此如果我需要加入另一个新创建的 table 时间,它会是一个非常缓慢的过程。)

id | date | first | last |
1 | 1992-01-01 | 1 | 1 |
2 | 1993-02-02 | 1 | 2 |
2 | 1993-03-03 | 2 | 1 |
3 | 1993-01-01 | 1 | 3 |
3 | 1994-01-01 | 2 | 2 |
3 | 1995-01-01 | 3 | 1 |

我需要检索这样的东西:

SELECT d.id,d.name,
t.date AS firstdate,
td.date AS lastdate
FROM dbo.dept d LEFT JOIN dbo.time t ON d.id=t.id AND t.first=1
LEFT JOIN time td ON d.id=td.id AND td.last=1

怎么回答最优化?

GROUP BY 将在单个 map-reduce 作业中完成的操作

select      id
           ,max(name)   as name
           ,max(case when first = 1 then `date` end) as firstdate
           ,max(case when last  = 1 then `date` end) as lastdate

from       (select      id
                       ,null as name 
                       ,`date`         
                       ,first         
                       ,last 

            from        time

            where       first = 1
                    or  last  = 1

            union all  

            select      id 
                       ,name         
                       ,null as `date` 
                       ,null as first 
                       ,null as last  

            from        dept
            ) t

group by    id 
;

+----+------+------------+------------+
| id | name | firstdate  |  lastdate  |
+----+------+------------+------------+
|  1 | a    | 1992-01-01 | 1992-01-01 |
|  2 | b    | 1993-02-02 | 1993-03-03 |
|  3 | c    | 1993-01-01 | 1995-01-01 |
|  4 | d    | (null)     | (null)     |
+----+------+------------+------------+      
select      d.id
       ,max(d.name)   as name
       ,max(case when t.first = 1 then t.date end) as 'firstdate'
       ,max(case when t.last  = 1 then t.date end) as 'lastdate'

from      dept d  left join  
      time t on d.id = t.id
where     t.first = 1  or  t.last  = 1
group by  d.id