Hive 合并查询 - 评估 cardinality_violation(_col0,_col1) 时出错

Hive Merge Query - Error evaluating cardinality_violation(_col0,_col1)

我正在尝试 运行 Hive 查询。它因以下错误而失败。

Error: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row (tag=0) {"key":{"_col0":{"transactionid":0,"bucketid":-1,"rowid":1},"_col1":"2020-10-28"},"value":{"_col0":1}}
        at org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:256)
        at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:444)
        at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:392)
        at org.apache.hadoop.mapred.YarnChild.run(YarnChild.java:170)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1866)
        at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:164)
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row (tag=0) {"key":{"_col0":{"transactionid":0,"bucketid":-1,"rowid":1},"_col1":"2020-10-28"},"value":{"_col0":1}}
        at org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:244)
        ... 7 more
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Error evaluating cardinality_violation(_col0,_col1)
        at org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:86)
        at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:841)
        at org.apache.hadoop.hive.ql.exec.FilterOperator.process(FilterOperator.java:122)
        at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:841)
        at org.apache.hadoop.hive.ql.exec.GroupByOperator.forward(GroupByOperator.java:1022)
        at org.apache.hadoop.hive.ql.exec.GroupByOperator.processAggr(GroupByOperator.java:827)
        at org.apache.hadoop.hive.ql.exec.GroupByOperator.processKey(GroupByOperator.java:701)
        at org.apache.hadoop.hive.ql.exec.GroupByOperator.process(GroupByOperator.java:767)
        at org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:235)
        ... 7 more
Caused by: java.lang.RuntimeException: Cardinality Violation in Merge statement: [0, -1, 1],2020-10-12
        at org.apache.hadoop.hive.ql.udf.generic.GenericUDFCardinalityViolation.evaluate(GenericUDFCardinalityViolation.java:56)
        at org.apache.hadoop.hive.ql.exec.ExprNodeGenericFuncEvaluator._evaluate(ExprNodeGenericFuncEvaluator.java:186)
        at org.apache.hadoop.hive.ql.exec.ExprNodeEvaluator.evaluate(ExprNodeEvaluator.java:77)
        at org.apache.hadoop.hive.ql.exec.ExprNodeEvaluator.evaluate(ExprNodeEvaluator.java:65)
        at org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:81)
        ... 15 more

Container killed by the ApplicationMaster.
Container killed on request. Exit code is 143
Container exited with a non-zero exit code 143

下面是查询,

MERGE INTO TABLE1 A
using  (select * from TABLE2) B
ON
LOWER(TRIM(A.A)) = LOWER(TRIM(B.A)) AND
LOWER(TRIM(A.B)) = LOWER(TRIM(B.B))
WHEN MATCHED AND LOWER(TRIM(A.C)) = LOWER(TRIM(B.C))  OR TRIM(A.D)= TRIM(B.D)
THEN
UPDATE SET
A= regexp_replace(A,"[^ ']","#"),
B= regexp_replace(B,"[^@.]","#"),
C= regexp_replace(C,"[^.-]","#"),
D= regexp_replace(D, "[^ ']","#"),
E= regexp_replace(E, "[^ ']","#" ),
F= regexp_replace(F, "[^ .+-]","#"),
G= regexp_replace(G,"[^ ']","#"),
H= regexp_replace(H,"[^ ']","#"),
I= regexp_replace(I,"[^ ']","#"),
J= regexp_replace(J,"[^ ']","#"),
K= regexp_replace(K,"[^ ']","#"),
L= regexp_replace(L,"[^ .+-]","#"),
M= regexp_replace(M,"[^ ']","#"),
N= regexp_replace(N,"[^ ']","#"),
O= regexp_replace(O,"[^ ']","#"),
P= regexp_replace(P,"[^ ']","#"),
Q= regexp_replace(Q,"[^ ']","#"),
R= regexp_replace(R,"[^ .+-]","#"),
S= regexp_replace(S,"[^ ']","#"),
T= regexp_replace(T,"[^ +-.]","#");

尝试切换 Cardinality,但由于 arrayould of bound 异常而失败。

请分享任何信息或解决方案的知识或见解。

检查了几个overstack,没有找到任何与此问题相关的线索。

提前致谢

基数检查 (hive.merge.cardinality.check=false) 的切换将导致一些数据损坏,如果它能正常工作的话。

检查您的数据并解决问题。问题是 TABLE2 中有超过 1 行与 TABLE1 中的同一行相匹配。它可以在连接键中重复,您可以使用 row_number 过滤器或 distinct 等修复,或者修复您的 ON 子句,添加更多键,使其唯一。