Hadoop

Question

我有 2 个表格，其中包含以下几列

表 1

col1   col2   col3     val
11     221    38       10
null   90     null     989
78     90     null     77

表 2

col1   col2   col3  
12     221    78
23     null   67 
78     90     null

如果值匹配，我想首先在 col1 上加入这 2 个表，如果匹配停止，则在 col2 上加入，否则在 col3 上加入，如果任何列匹配，则填充 val else null，匹配的列则填充该列在 matchingcol 列中。所以，输出应该是这样的：

col1   col2   col3     val     matchingcol
11     221    38       10      col2
null   90     null     null    null
78     90     null     77      col1

我可以使用以下查询来执行此操作，但性能非常慢。如果有更好的写法以获得更快的性能，请告诉我

select *
from table1 t1 left join
     table2 t2_1
     on t2_1.col1 = t1.col1 left join
     table2 t2_2
     on t2_2.col2 = t1.col2 and t2_1.col1 
     left join table2 t2_3 on t2_3.col3 = t1.col3 and t2_2.col2 is null

ps: 我之前也问过同样的问题但是没有更好的答案

Answer 1

您描述的是：

select t1.col1, t1.col2, t1.col3, 
       (case when t2_1.col1 is not null or t2_2.col1 is not null or t2_3.col1 is not null then t1.val end) as val
       (case when t2_1.col1 is not null then 'col1'
             when t2_2.col2 is not null then 'col2'
             when t2_3.col3 is not null then 'col3'
        end) as matching
from table1 t1 left join
     table2 t2_1
     on t2_1.col1 = t1.col1 left join
     table2 t2_2
     on t2_2.col2 = t1.col2 and t2_1.col1 is null left join
     table2 t2_3
     on t2_3.col3 = t1.col3 and t2_2.col2 is null;

这可能是最好的方法。

Hadoop - Hive - Impala - 重写性能查询

Hadoop - Hive - Impala - rewrite a query for performance

sql

hive

impala