蜂巢:Kryo 异常
Hive: Kryo Exception
我正在执行我的一个 HQL 查询,它几乎没有连接、联合和插入覆盖操作,如果我 运行 它只运行一次就可以正常工作。
如果我第二次执行相同的工作,我就会面临这个问题。
有人可以帮我确定在哪种情况下会出现此异常吗?
Error: java.lang.RuntimeException: org.apache.hive.com.esotericsoftware.kryo.KryoException: Encountered unregistered class ID: 107
Serialization trace:
rowSchema (org.apache.hadoop.hive.ql.exec.MapJoinOperator)
parentOperators (org.apache.hadoop.hive.ql.exec.SelectOperator)
parentOperators (org.apache.hadoop.hive.ql.exec.MapJoinOperator)
parentOperators (org.apache.hadoop.hive.ql.exec.FilterOperator)
parentOperators (org.apache.hadoop.hive.ql.exec.SelectOperator)
parentOperators (org.apache.hadoop.hive.ql.exec.UnionOperator)
childOperators (org.apache.hadoop.hive.ql.exec.TableScanOperator)
aliasToWork (org.apache.hadoop.hive.ql.plan.MapWork)
at org.apache.hadoop.hive.ql.exec.Utilities.getBaseWork(Utilities.java:364)
at org.apache.hadoop.hive.ql.exec.Utilities.getMapWork(Utilities.java:275)
at org.apache.hadoop.hive.ql.io.HiveInputFormat.init(HiveInputFormat.java:254)
at org.apache.hadoop.hive.ql.io.HiveInputFormat.pushProjectionsAndFilters(HiveInputFormat.java:440)
at org.apache.hadoop.hive.ql.io.HiveInputFormat.pushProjectionsAndFilters(HiveInputFormat.java:433)
at org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getRecordReader(CombineHiveInputFormat.java:587)
at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.<init>(MapTask.java:169)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:429)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
at org.apache.hadoop.mapred.YarnChild.run(YarnChild.java:168)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
Caused by: org.apache.hive.com.esotericsoftware.kryo.KryoException: Encountered unregistered class ID: 107
Serialization trace:
rowSchema (org.apache.hadoop.hive.ql.exec.MapJoinOperator)
parentOperators (org.apache.hadoop.hive.ql.exec.SelectOperator)
parentOperators (org.apache.hadoop.hive.ql.exec.MapJoinOperator)
parentOperators (org.apache.hadoop.hive.ql.exec.FilterOperator)
parentOperators (org.apache.hadoop.hive.ql.exec.SelectOperator)
parentOperators (org.apache.hadoop.hive.ql.exec.UnionOperator)
childOperators (org.apache.hadoop.hive.ql.exec.TableScanOperator)
aliasToWork (org.apache.hadoop.hive.ql.plan.MapWork)
通过将下面的 属性 修改为 false 来避免 Hive 的并行执行。
hive.exec.parallel
让我知道它是否适合你。
我尝试了 set hive.exec.parallel = false;
然后 运行 成功了,虽然速度较慢。我的代码是:
SELECT
CASE WHEN a.did IS NOT NULL THEN a.did ELSE b.did END AS device_id,
CASE WHEN a.did IS NOT NULL THEN a.package ELSE b.package END AS package,
CASE WHEN a.did IS NOT NULL THEN a.channel ELSE b.channel END AS channel,
CASE WHEN a.did IS NOT NULL THEN a.time ELSE b.time END AS time
FROM
(SELECT
a1.package,
a1.did,
MIN(a1.source) AS channel,
MIN(a1.time) AS time
FROM
(SELECT * FROM thetable
WHERE date_hour = "20160601"
AND source_type IN ('A', 'B', 'C')
) a1
JOIN
(SELECT
package AS package,
did AS did,
MIN(time) AS time
FROM thetable
WHERE date_hour = "20160601"
AND source_type IN ('A', 'B', 'C')
GROUP BY package, did
) min
ON (a1.package = min.package
AND a1.did = min.did
AND a1.time = min.time)
GROUP BY a1.package, a1.did
) a
FULL OUTER JOIN
(SELECT
a1.package,
a1.did,
MIN(a1.source) AS channel,
MIN(a1.time) AS time
FROM
(SELECT * FROM thetable
WHERE date_hour = "20160601"
AND source_type IN ('D')
) a1
JOIN
(SELECT
package AS package,
did AS did,
MIN(time) AS time
FROM thetable
WHERE date_hour = "20160601"
AND source_type IN ('D')
GROUP BY package, did
) min
ON (a1.package = min.package
AND a1.did = min.did
AND a1.time = min.time)
GROUP BY a1.package, a1.did
) b
ON (a.package = b.package AND a.did = b.did);
我正在执行我的一个 HQL 查询,它几乎没有连接、联合和插入覆盖操作,如果我 运行 它只运行一次就可以正常工作。
如果我第二次执行相同的工作,我就会面临这个问题。
有人可以帮我确定在哪种情况下会出现此异常吗?
Error: java.lang.RuntimeException: org.apache.hive.com.esotericsoftware.kryo.KryoException: Encountered unregistered class ID: 107
Serialization trace:
rowSchema (org.apache.hadoop.hive.ql.exec.MapJoinOperator)
parentOperators (org.apache.hadoop.hive.ql.exec.SelectOperator)
parentOperators (org.apache.hadoop.hive.ql.exec.MapJoinOperator)
parentOperators (org.apache.hadoop.hive.ql.exec.FilterOperator)
parentOperators (org.apache.hadoop.hive.ql.exec.SelectOperator)
parentOperators (org.apache.hadoop.hive.ql.exec.UnionOperator)
childOperators (org.apache.hadoop.hive.ql.exec.TableScanOperator)
aliasToWork (org.apache.hadoop.hive.ql.plan.MapWork)
at org.apache.hadoop.hive.ql.exec.Utilities.getBaseWork(Utilities.java:364)
at org.apache.hadoop.hive.ql.exec.Utilities.getMapWork(Utilities.java:275)
at org.apache.hadoop.hive.ql.io.HiveInputFormat.init(HiveInputFormat.java:254)
at org.apache.hadoop.hive.ql.io.HiveInputFormat.pushProjectionsAndFilters(HiveInputFormat.java:440)
at org.apache.hadoop.hive.ql.io.HiveInputFormat.pushProjectionsAndFilters(HiveInputFormat.java:433)
at org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getRecordReader(CombineHiveInputFormat.java:587)
at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.<init>(MapTask.java:169)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:429)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
at org.apache.hadoop.mapred.YarnChild.run(YarnChild.java:168)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
Caused by: org.apache.hive.com.esotericsoftware.kryo.KryoException: Encountered unregistered class ID: 107
Serialization trace:
rowSchema (org.apache.hadoop.hive.ql.exec.MapJoinOperator)
parentOperators (org.apache.hadoop.hive.ql.exec.SelectOperator)
parentOperators (org.apache.hadoop.hive.ql.exec.MapJoinOperator)
parentOperators (org.apache.hadoop.hive.ql.exec.FilterOperator)
parentOperators (org.apache.hadoop.hive.ql.exec.SelectOperator)
parentOperators (org.apache.hadoop.hive.ql.exec.UnionOperator)
childOperators (org.apache.hadoop.hive.ql.exec.TableScanOperator)
aliasToWork (org.apache.hadoop.hive.ql.plan.MapWork)
通过将下面的 属性 修改为 false 来避免 Hive 的并行执行。
hive.exec.parallel
让我知道它是否适合你。
我尝试了 set hive.exec.parallel = false;
然后 运行 成功了,虽然速度较慢。我的代码是:
SELECT
CASE WHEN a.did IS NOT NULL THEN a.did ELSE b.did END AS device_id,
CASE WHEN a.did IS NOT NULL THEN a.package ELSE b.package END AS package,
CASE WHEN a.did IS NOT NULL THEN a.channel ELSE b.channel END AS channel,
CASE WHEN a.did IS NOT NULL THEN a.time ELSE b.time END AS time
FROM
(SELECT
a1.package,
a1.did,
MIN(a1.source) AS channel,
MIN(a1.time) AS time
FROM
(SELECT * FROM thetable
WHERE date_hour = "20160601"
AND source_type IN ('A', 'B', 'C')
) a1
JOIN
(SELECT
package AS package,
did AS did,
MIN(time) AS time
FROM thetable
WHERE date_hour = "20160601"
AND source_type IN ('A', 'B', 'C')
GROUP BY package, did
) min
ON (a1.package = min.package
AND a1.did = min.did
AND a1.time = min.time)
GROUP BY a1.package, a1.did
) a
FULL OUTER JOIN
(SELECT
a1.package,
a1.did,
MIN(a1.source) AS channel,
MIN(a1.time) AS time
FROM
(SELECT * FROM thetable
WHERE date_hour = "20160601"
AND source_type IN ('D')
) a1
JOIN
(SELECT
package AS package,
did AS did,
MIN(time) AS time
FROM thetable
WHERE date_hour = "20160601"
AND source_type IN ('D')
GROUP BY package, did
) min
ON (a1.package = min.package
AND a1.did = min.did
AND a1.time = min.time)
GROUP BY a1.package, a1.did
) b
ON (a.package = b.package AND a.did = b.did);