蜂巢:Kryo 异常

Hive: Kryo Exception

我正在执行我的一个 HQL 查询,它几乎没有连接、联合和插入覆盖操作,如果我 运行 它只运行一次就可以正常工作。
如果我第二次执行相同的工作,我就会面临这个问题。 有人可以帮我确定在哪种情况下会出现此异常吗?

Error: java.lang.RuntimeException: org.apache.hive.com.esotericsoftware.kryo.KryoException: Encountered unregistered class ID: 107
Serialization trace:
rowSchema (org.apache.hadoop.hive.ql.exec.MapJoinOperator)
parentOperators (org.apache.hadoop.hive.ql.exec.SelectOperator)
parentOperators (org.apache.hadoop.hive.ql.exec.MapJoinOperator)
parentOperators (org.apache.hadoop.hive.ql.exec.FilterOperator)
parentOperators (org.apache.hadoop.hive.ql.exec.SelectOperator)
parentOperators (org.apache.hadoop.hive.ql.exec.UnionOperator)
childOperators (org.apache.hadoop.hive.ql.exec.TableScanOperator)
aliasToWork (org.apache.hadoop.hive.ql.plan.MapWork)
    at org.apache.hadoop.hive.ql.exec.Utilities.getBaseWork(Utilities.java:364)
    at org.apache.hadoop.hive.ql.exec.Utilities.getMapWork(Utilities.java:275)
    at org.apache.hadoop.hive.ql.io.HiveInputFormat.init(HiveInputFormat.java:254)
    at org.apache.hadoop.hive.ql.io.HiveInputFormat.pushProjectionsAndFilters(HiveInputFormat.java:440)
    at org.apache.hadoop.hive.ql.io.HiveInputFormat.pushProjectionsAndFilters(HiveInputFormat.java:433)
    at org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getRecordReader(CombineHiveInputFormat.java:587)
    at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.<init>(MapTask.java:169)
    at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:429)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
    at org.apache.hadoop.mapred.YarnChild.run(YarnChild.java:168)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:415)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
    at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
Caused by: org.apache.hive.com.esotericsoftware.kryo.KryoException: Encountered unregistered class ID: 107
Serialization trace:
rowSchema (org.apache.hadoop.hive.ql.exec.MapJoinOperator)
parentOperators (org.apache.hadoop.hive.ql.exec.SelectOperator)
parentOperators (org.apache.hadoop.hive.ql.exec.MapJoinOperator)
parentOperators (org.apache.hadoop.hive.ql.exec.FilterOperator)
parentOperators (org.apache.hadoop.hive.ql.exec.SelectOperator)
parentOperators (org.apache.hadoop.hive.ql.exec.UnionOperator)
childOperators (org.apache.hadoop.hive.ql.exec.TableScanOperator)
aliasToWork (org.apache.hadoop.hive.ql.plan.MapWork)

通过将下面的 属性 修改为 false 来避免 Hive 的并行执行。

hive.exec.parallel

让我知道它是否适合你。

我尝试了 set hive.exec.parallel = false; 然后 运行 成功了,虽然速度较慢。我的代码是:

SELECT
    CASE WHEN a.did IS NOT NULL THEN a.did ELSE b.did END AS device_id,
    CASE WHEN a.did IS NOT NULL THEN a.package ELSE b.package END AS package,
    CASE WHEN a.did IS NOT NULL THEN a.channel ELSE b.channel END AS channel,
    CASE WHEN a.did IS NOT NULL THEN a.time ELSE b.time END AS time
FROM
    (SELECT
      a1.package,
      a1.did,
      MIN(a1.source) AS channel,
      MIN(a1.time) AS time
    FROM
      (SELECT * FROM thetable
        WHERE date_hour = "20160601"
          AND source_type IN ('A', 'B', 'C')
      ) a1
      JOIN
      (SELECT
        package AS package,
        did AS did,
        MIN(time) AS time
      FROM thetable
      WHERE date_hour = "20160601"
        AND source_type IN ('A', 'B', 'C')
      GROUP BY package, did
      ) min
      ON (a1.package = min.package
        AND a1.did = min.did
        AND a1.time = min.time)
    GROUP BY a1.package, a1.did
    ) a
    FULL OUTER JOIN
    (SELECT
      a1.package,
      a1.did,
      MIN(a1.source) AS channel,
      MIN(a1.time) AS time
    FROM
      (SELECT * FROM thetable
        WHERE date_hour = "20160601"
          AND source_type IN ('D')
      ) a1
      JOIN
      (SELECT
        package AS package,
        did AS did,
        MIN(time) AS time
      FROM thetable
      WHERE date_hour = "20160601"
        AND source_type IN ('D')
      GROUP BY package, did
      ) min
      ON (a1.package = min.package
        AND a1.did = min.did
        AND a1.time = min.time)
    GROUP BY a1.package, a1.did
    ) b
    ON (a.package = b.package AND a.did = b.did);