小型数据集上的 Hive 查询永远不会完成(或 OOM)

Hive query on small dataset never finishes (or OOM)

对小样本数据集(195 行,22 列)执行简单查询要么引发内存不足异常,要么按照许多建议增加内存大小,永远不会结束。

尝试过的选项

有时 OOM 错误消失了,但它运行了几个小时却没有任何结果...

查询

select * lag(status, 1, null) over (partition by type_id order by time) as status_prev from sample_table

永不停止的示例查询

 hive -hiveconf hive.tez.container.size=2048 -hiveconf hive.tez.java.opts=-Xmx1640m -hiveconf tez.runtime.io.sort.mb=820 -hiveconf tez.runtime.unordered.output.buffer.size-mb=205 -e "select * lag(status, 1, null) over (partition by type_id order by time) as status_prev from sample_table"

内存不足

Status: Running (Executing on YARN cluster with App id application_1473144435077_0015)

--------------------------------------------------------------------------------
        VERTICES      STATUS  TOTAL  COMPLETED  RUNNING  PENDING  FAILED  KILLED
--------------------------------------------------------------------------------
Map 1                 FAILED      1          0        0        1       4       0
Reducer 2             KILLED      1          0        0        1       0       0
--------------------------------------------------------------------------------
VERTICES: 00/02  [>>--------------------------] 0%    ELAPSED TIME: 18.30 s
--------------------------------------------------------------------------------
Status: Failed
Vertex failed, vertexName=Map 1, vertexId=vertex_1473144435077_0015_1_00, diagnostics=[Task failed, taskId=task_1473144435077_0015_1_00_000000, diagnostics=[TaskAttempt 0 failed, info=[Error: Failure while running task:java.lang.RuntimeException: java.lang.OutOfMemoryError: Java heap space
        at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:159)
        at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:139)
        at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:344)
        at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.run(TezTaskRunner.java:181)
        at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.run(TezTaskRunner.java:172)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:415)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
        at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:172)
        at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:168)
        at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
        at java.util.concurrent.FutureTask.run(FutureTask.java:262)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.OutOfMemoryError: Java heap space
        at java.nio.HeapByteBuffer.<init>(HeapByteBuffer.java:57)
        at java.nio.ByteBuffer.allocate(ByteBuffer.java:331)
        at org.apache.tez.runtime.library.common.sort.impl.PipelinedSorter.<init>(PipelinedSorter.java:172)
        at org.apache.tez.runtime.library.common.sort.impl.PipelinedSorter.<init>(PipelinedSorter.java:116)
        at org.apache.tez.runtime.library.output.OrderedPartitionedKVOutput.start(OrderedPartitionedKVOutput.java:142)
        at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.init(MapRecordProcessor.java:142)
        at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:149)
        ... 14 more

永不止步 (33 秒是示例,数小时内不会停止)

Status: Running (Executing on YARN cluster with App id application_1473144435077_0025)

--------------------------------------------------------------------------------
        VERTICES      STATUS  TOTAL  COMPLETED  RUNNING  PENDING  FAILED  KILLED
--------------------------------------------------------------------------------
Map 1                 INITED      1          0        0        1       0       0
Reducer 2             INITED      1          0        0        1       0       0
--------------------------------------------------------------------------------
VERTICES: 00/02  [>>--------------------------] 0%    ELAPSED TIME: 33.32 s
--------------------------------------------------------------------------------

我花了很长时间才找到答案,希望这对其他人有帮助...

所以这分解为 2 个问题:

  1. 堆大小太小,通过增加堆大小解决
  2. Hive 作业卡在挂起状态

以下命令解决了我的问题

hive -hiveconf hive.tez.container.size=512 -hiveconf hive.tez.java.opts="-server -Xmx512m -Djava.net.preferIPv4Stack=true" -e "select * lag(status, 1, null) over (partition by type_id order by time) as status_prev from sample_table"

Source