即使发生 IOException,作业也成功完成

Jobs finishing successfully even though IOException occurs

当 运行 GridMix 时,我的主节点收到各种 IOException,我想知道这是否是我应该真正关心的事情,或者它是否是我的工作成功完成时的短暂事件:

IOException: Bad connect ack with firstBadLink: \
java.io.IOException: Bad response ERROR for block BP-49483579-10.0.1.190-1449960324681:blk_1073746606_5783 from datanode 10.0.1.192:50010
    at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer$ResponseProcessor.run(DFSOutputStream.java:819)

忽略它?

try {
  ...
} catch (IOException iox) {
  //***NOP***
}

在我了解您的完整设置之前我无法确定,但很有可能是这些异常是在附加到管道设置时发生的,就代码而言,您可以说 stage == BlockConstructionStage.PIPELINE_SETUP_APPEND.

在任何情况下,既然您的作业已成功完成,您不必担心,它成功完成的原因是因为在尝试打开 DataOutputStream 到 DataNode 管道时并且发生了一些异常然后它继续尝试直到管道被设置。

异常发生在 org.apache.hadoop.hdfs.DFSOutputStream,下面是重要的代码片段,供您理解。

 private boolean createBlockOutputStream(DatanodeInfo[] nodes, long newGS, boolean recoveryFlag) {
    //Code..
    if (pipelineStatus != SUCCESS) {
      if (pipelineStatus == Status.ERROR_ACCESS_TOKEN) {
        throw new InvalidBlockTokenException(
            "Got access token error for connect ack with firstBadLink as "
                + firstBadLink);
      } else {
        throw new IOException("Bad connect ack with firstBadLink as "
            + firstBadLink);
      }
    }
    //Code..
}

现在,createBlockOutputStreamsetupPipelineForAppendOrRecovery 调用,正如该方法的代码注释中提到的 - "It keeps on trying until a pipeline is setup".

/**
 * Open a DataOutputStream to a DataNode pipeline so that 
 * it can be written to.
 * This happens when a file is appended or data streaming fails
 * It keeps on trying until a pipeline is setup
 */
private boolean setupPipelineForAppendOrRecovery() throws IOException {
    //Code..
    while (!success && !streamerClosed && dfsClient.clientRunning) {
        //Code..
        success = createBlockOutputStream(nodes, newGS, isRecovery);
    }
    //Code..
}

如果您将阅读完整的 org.apache.hadoop.hdfs.DFSOutputStream 代码,您将了解管道设置试验将继续进行,直到创建用于追加或全新使用的管道。

如果你想处理它,那么你可以尝试从 hdfs-site.xml 调整 dfs.datanode.max.xcievers 属性,大多数人已经报告了相同的解决方案。请注意,设置 属性.

后需要重新启动 hadoop 服务
<property>
        <name>dfs.datanode.max.xcievers</name>
        <value>8192</value>
</property>