"Not scheduling suspect block" 垃圾邮件 hdfs datanode 日志
"Not scheduling suspect block" spams hdfs datanode log
查看dn日志,发现有很多日志如下:
2016-07-01 05:53:29,328 INFO org.apache.hadoop.hdfs.server.datanode.VolumeScanner: VolumeScanner(/data12/hadoop/dfs, DS-086bc494-d862-470c-86e8-9cb7929985c6): Not scheduling suspect block BP-360285305-10.130.1.11-1444619256876:blk_1095475173_21737939 for rescanning, because we rescanned it recently.
2016-07-01 05:53:29,330 INFO org.apache.hadoop.hdfs.server.datanode.VolumeScanner: VolumeScanner(/data12/hadoop/dfs, DS-086bc494-d862-470c-86e8-9cb7929985c6): Not scheduling suspect block BP-360285305-10.130.1.11-1444619256876:blk_1095475173_21737939 for rescanning, because we rescanned it recently.
2016-07-01 05:53:29,334 INFO org.apache.hadoop.hdfs.server.datanode.VolumeScanner: VolumeScanner(/data12/hadoop/dfs, DS-086bc494-d862-470c-86e8-9cb7929985c6): Not scheduling suspect block BP-360285305-10.130.1.11-1444619256876:blk_1095475173_21737939 for rescanning, because we rescanned it recently.
2016-07-01 05:53:29,340 INFO org.apache.hadoop.hdfs.server.datanode.VolumeScanner: VolumeScanner(/data12/hadoop/dfs, DS-086bc494-d862-470c-86e8-9cb7929985c6): Not scheduling suspect block BP-360285305-10.130.1.11-1444619256876:blk_1095475173_21737939 for rescanning, because we rescanned it recently.
2016-07-01 05:53:29,342 INFO org.apache.hadoop.hdfs.server.datanode.VolumeScanner: VolumeScanner(/data12/hadoop/dfs, DS-086bc494-d862-470c-86e8-9cb7929985c6): Not scheduling suspect block BP-360285305-10.130.1.11-1444619256876:blk_1095475173_21737939 for rescanning, because we rescanned it recently.
2016-07-01 05:53:29,344 INFO org.apache.hadoop.hdfs.server.datanode.VolumeScanner: VolumeScanner(/data12/hadoop/dfs, DS-086bc494-d862-470c-86e8-9cb7929985c6): Not scheduling suspect block BP-360285305-10.130.1.11-1444619256876:blk_1095475173_21737939 for rescanning, because we rescanned it recently.
2016-07-01 05:53:29,346 INFO org.apache.hadoop.hdfs.server.datanode.VolumeScanner: VolumeScanner(/data12/hadoop/dfs, DS-086bc494-d862-470c-86e8-9cb7929985c6): Not scheduling suspect block BP-360285305-10.130.1.11-1444619256876:blk_1095475173_21737939 for rescanning, because we rescanned it recently.
异常堆栈跟踪如下所示:
2016-07-01 11:56:38,601 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(x.x.x.29:50010, datanodeUuid=f3d795cc-2b3b-43b9-90c3-e4157c031d2c, infoPort=50075, infoSecurePort=0, ipcPort=50020, storageInfo=lv=-56;cid=CID-a99b693d-6f26-48fe-ad37-9f8162f70b22;nsid=920937379;c=0):encountered exception while serving BP-360285305-10.130.1.11-1444619256876:blk_1103265984_29531546 to /x.x.x.21:44153
java.io.IOException: Connection reset by peer
at sun.nio.ch.FileChannelImpl.transferTo0(Native Method)
at sun.nio.ch.FileChannelImpl.transferToDirectlyInternal(FileChannelImpl.java:427)
at sun.nio.ch.FileChannelImpl.transferToDirectly(FileChannelImpl.java:492)
at sun.nio.ch.FileChannelImpl.transferTo(FileChannelImpl.java:607)
at org.apache.hadoop.net.SocketOutputStream.transferToFully(SocketOutputStream.java:223)
at org.apache.hadoop.hdfs.server.datanode.BlockSender.sendPacket(BlockSender.java:586)
at org.apache.hadoop.hdfs.server.datanode.BlockSender.doSendBlock(BlockSender.java:769)
at org.apache.hadoop.hdfs.server.datanode.BlockSender.sendBlock(BlockSender.java:716)
at org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:551)
at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:116)
at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:71)
at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:251)
at java.lang.Thread.run(Thread.java:745)
我们发现与这些日志相关的块与hbase rs有关系,根据日志,rs尝试读取文件。
所以我的问题是,在哪种情况下 rs 会尝试读取文件?从日志看,rs尝试每秒发起一次新的读取请求。
如有任何建议或想法,我们将不胜感激。
hbase 重复读取blk_1105510536_31776579! "java.net.SocketException: Original Exception : java.io.IOException: Connection reset by peer"
导致的所有不成功读取
2016-06-30 11:21:49,079 TRACE org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(10.130.1.29:50010, datanodeUuid=f3d795cc-2b3b-43b9-90c3-e4157c031d2c, infoPort=50075, infoSecurePort=0, ipcPort=50020, storageInfo=lv=-56;cid=CID-a99b693d-6f26-48fe-ad37-9f8162f70b22;nsid=920937379;c=0):Ignoring exception while serving BP-360285305-10.130.1.11-1444619256876:blk_1105510536_31776579 to /10.130.1.21:40140
2016-06-30 11:21:49,148 TRACE org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(10.130.1.29:50010, datanodeUuid=f3d795cc-2b3b-43b9-90c3-e4157c031d2c, infoPort=50075, infoSecurePort=0, ipcPort=50020, storageInfo=lv=-56;cid=CID-a99b693d-6f26-48fe-ad37-9f8162f70b22;nsid=920937379;c=0):Ignoring exception while serving BP-360285305-10.130.1.11-1444619256876:blk_1105510536_31776579 to /10.130.1.21:40154
2016-06-30 11:21:50,509 TRACE org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(10.130.1.29:50010, datanodeUuid=f3d795cc-2b3b-43b9-90c3-e4157c031d2c, infoPort=50075, infoSecurePort=0, ipcPort=50020, storageInfo=lv=-56;cid=CID-a99b693d-6f26-48fe-ad37-9f8162f70b22;nsid=920937379;c=0):Ignoring exception while serving BP-360285305-10.130.1.11-1444619256876:blk_1105510536_31776579 to /10.130.1.21:40157
2016-06-30 11:21:51,229 TRACE org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(10.130.1.29:50010, datanodeUuid=f3d795cc-2b3b-43b9-90c3-e4157c031d2c, infoPort=50075, infoSecurePort=0, ipcPort=50020, storageInfo=lv=-56;cid=CID-a99b693d-6f26-48fe-ad37-9f8162f70b22;nsid=920937379;c=0):Ignoring exception while serving BP-360285305-10.130.1.11-1444619256876:blk_1105510536_31776579 to /10.130.1.21:40168
2016-06-30 11:21:51,921 TRACE org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(10.130.1.29:50010, datanodeUuid=f3d795cc-2b3b-43b9-90c3-e4157c031d2c, infoPort=50075, infoSecurePort=0, ipcPort=50020, storageInfo=lv=-56;cid=CID-a99b693d-6f26-48fe-ad37-9f8162f70b22;nsid=920937379;c=0):Ignoring exception while serving BP-360285305-10.130.1.11-1444619256876:blk_1105510536_31776579 to /10.130.1.21:40181
2016-06-30 11:21:51,946 TRACE org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(10.130.1.29:50010, datanodeUuid=f3d795cc-2b3b-43b9-90c3-e4157c031d2c, infoPort=50075, infoSecurePort=0, ipcPort=50020, storageInfo=lv=-56;cid=CID-a99b693d-6f26-48fe-ad37-9f8162f70b22;nsid=920937379;c=0):Ignoring exception while serving BP-360285305-10.130.1.11-1444619256876:blk_1105510536_31776579 to /10.130.1.21:40187
2016-06-30 11:21:53,247 TRACE org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(10.130.1.29:50010, datanodeUuid=f3d795cc-2b3b-43b9-90c3-e4157c031d2c, infoPort=50075, infoSecurePort=0, ipcPort=50020, storageInfo=lv=-56;cid=CID-a99b693d-6f26-48fe-ad37-9f8162f70b22;nsid=920937379;c=0):Ignoring exception while serving BP-360285305-10.130.1.11-1444619256876:blk_1105510536_31776579 to /10.130.1.21:40188
原来是因为DFSInputStream在读取前做seeking导致的。如果InputStream被over read,则从target pos开始发起新的请求。
查看dn日志,发现有很多日志如下:
2016-07-01 05:53:29,328 INFO org.apache.hadoop.hdfs.server.datanode.VolumeScanner: VolumeScanner(/data12/hadoop/dfs, DS-086bc494-d862-470c-86e8-9cb7929985c6): Not scheduling suspect block BP-360285305-10.130.1.11-1444619256876:blk_1095475173_21737939 for rescanning, because we rescanned it recently. 2016-07-01 05:53:29,330 INFO org.apache.hadoop.hdfs.server.datanode.VolumeScanner: VolumeScanner(/data12/hadoop/dfs, DS-086bc494-d862-470c-86e8-9cb7929985c6): Not scheduling suspect block BP-360285305-10.130.1.11-1444619256876:blk_1095475173_21737939 for rescanning, because we rescanned it recently. 2016-07-01 05:53:29,334 INFO org.apache.hadoop.hdfs.server.datanode.VolumeScanner: VolumeScanner(/data12/hadoop/dfs, DS-086bc494-d862-470c-86e8-9cb7929985c6): Not scheduling suspect block BP-360285305-10.130.1.11-1444619256876:blk_1095475173_21737939 for rescanning, because we rescanned it recently. 2016-07-01 05:53:29,340 INFO org.apache.hadoop.hdfs.server.datanode.VolumeScanner: VolumeScanner(/data12/hadoop/dfs, DS-086bc494-d862-470c-86e8-9cb7929985c6): Not scheduling suspect block BP-360285305-10.130.1.11-1444619256876:blk_1095475173_21737939 for rescanning, because we rescanned it recently. 2016-07-01 05:53:29,342 INFO org.apache.hadoop.hdfs.server.datanode.VolumeScanner: VolumeScanner(/data12/hadoop/dfs, DS-086bc494-d862-470c-86e8-9cb7929985c6): Not scheduling suspect block BP-360285305-10.130.1.11-1444619256876:blk_1095475173_21737939 for rescanning, because we rescanned it recently. 2016-07-01 05:53:29,344 INFO org.apache.hadoop.hdfs.server.datanode.VolumeScanner: VolumeScanner(/data12/hadoop/dfs, DS-086bc494-d862-470c-86e8-9cb7929985c6): Not scheduling suspect block BP-360285305-10.130.1.11-1444619256876:blk_1095475173_21737939 for rescanning, because we rescanned it recently. 2016-07-01 05:53:29,346 INFO org.apache.hadoop.hdfs.server.datanode.VolumeScanner: VolumeScanner(/data12/hadoop/dfs, DS-086bc494-d862-470c-86e8-9cb7929985c6): Not scheduling suspect block BP-360285305-10.130.1.11-1444619256876:blk_1095475173_21737939 for rescanning, because we rescanned it recently.
异常堆栈跟踪如下所示:
2016-07-01 11:56:38,601 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(x.x.x.29:50010, datanodeUuid=f3d795cc-2b3b-43b9-90c3-e4157c031d2c, infoPort=50075, infoSecurePort=0, ipcPort=50020, storageInfo=lv=-56;cid=CID-a99b693d-6f26-48fe-ad37-9f8162f70b22;nsid=920937379;c=0):encountered exception while serving BP-360285305-10.130.1.11-1444619256876:blk_1103265984_29531546 to /x.x.x.21:44153 java.io.IOException: Connection reset by peer at sun.nio.ch.FileChannelImpl.transferTo0(Native Method) at sun.nio.ch.FileChannelImpl.transferToDirectlyInternal(FileChannelImpl.java:427) at sun.nio.ch.FileChannelImpl.transferToDirectly(FileChannelImpl.java:492) at sun.nio.ch.FileChannelImpl.transferTo(FileChannelImpl.java:607) at org.apache.hadoop.net.SocketOutputStream.transferToFully(SocketOutputStream.java:223) at org.apache.hadoop.hdfs.server.datanode.BlockSender.sendPacket(BlockSender.java:586) at org.apache.hadoop.hdfs.server.datanode.BlockSender.doSendBlock(BlockSender.java:769) at org.apache.hadoop.hdfs.server.datanode.BlockSender.sendBlock(BlockSender.java:716) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:551) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:116) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:71) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:251) at java.lang.Thread.run(Thread.java:745)
我们发现与这些日志相关的块与hbase rs有关系,根据日志,rs尝试读取文件。
所以我的问题是,在哪种情况下 rs 会尝试读取文件?从日志看,rs尝试每秒发起一次新的读取请求。
如有任何建议或想法,我们将不胜感激。
hbase 重复读取blk_1105510536_31776579! "java.net.SocketException: Original Exception : java.io.IOException: Connection reset by peer"
导致的所有不成功读取2016-06-30 11:21:49,079 TRACE org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(10.130.1.29:50010, datanodeUuid=f3d795cc-2b3b-43b9-90c3-e4157c031d2c, infoPort=50075, infoSecurePort=0, ipcPort=50020, storageInfo=lv=-56;cid=CID-a99b693d-6f26-48fe-ad37-9f8162f70b22;nsid=920937379;c=0):Ignoring exception while serving BP-360285305-10.130.1.11-1444619256876:blk_1105510536_31776579 to /10.130.1.21:40140 2016-06-30 11:21:49,148 TRACE org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(10.130.1.29:50010, datanodeUuid=f3d795cc-2b3b-43b9-90c3-e4157c031d2c, infoPort=50075, infoSecurePort=0, ipcPort=50020, storageInfo=lv=-56;cid=CID-a99b693d-6f26-48fe-ad37-9f8162f70b22;nsid=920937379;c=0):Ignoring exception while serving BP-360285305-10.130.1.11-1444619256876:blk_1105510536_31776579 to /10.130.1.21:40154 2016-06-30 11:21:50,509 TRACE org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(10.130.1.29:50010, datanodeUuid=f3d795cc-2b3b-43b9-90c3-e4157c031d2c, infoPort=50075, infoSecurePort=0, ipcPort=50020, storageInfo=lv=-56;cid=CID-a99b693d-6f26-48fe-ad37-9f8162f70b22;nsid=920937379;c=0):Ignoring exception while serving BP-360285305-10.130.1.11-1444619256876:blk_1105510536_31776579 to /10.130.1.21:40157 2016-06-30 11:21:51,229 TRACE org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(10.130.1.29:50010, datanodeUuid=f3d795cc-2b3b-43b9-90c3-e4157c031d2c, infoPort=50075, infoSecurePort=0, ipcPort=50020, storageInfo=lv=-56;cid=CID-a99b693d-6f26-48fe-ad37-9f8162f70b22;nsid=920937379;c=0):Ignoring exception while serving BP-360285305-10.130.1.11-1444619256876:blk_1105510536_31776579 to /10.130.1.21:40168 2016-06-30 11:21:51,921 TRACE org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(10.130.1.29:50010, datanodeUuid=f3d795cc-2b3b-43b9-90c3-e4157c031d2c, infoPort=50075, infoSecurePort=0, ipcPort=50020, storageInfo=lv=-56;cid=CID-a99b693d-6f26-48fe-ad37-9f8162f70b22;nsid=920937379;c=0):Ignoring exception while serving BP-360285305-10.130.1.11-1444619256876:blk_1105510536_31776579 to /10.130.1.21:40181 2016-06-30 11:21:51,946 TRACE org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(10.130.1.29:50010, datanodeUuid=f3d795cc-2b3b-43b9-90c3-e4157c031d2c, infoPort=50075, infoSecurePort=0, ipcPort=50020, storageInfo=lv=-56;cid=CID-a99b693d-6f26-48fe-ad37-9f8162f70b22;nsid=920937379;c=0):Ignoring exception while serving BP-360285305-10.130.1.11-1444619256876:blk_1105510536_31776579 to /10.130.1.21:40187 2016-06-30 11:21:53,247 TRACE org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(10.130.1.29:50010, datanodeUuid=f3d795cc-2b3b-43b9-90c3-e4157c031d2c, infoPort=50075, infoSecurePort=0, ipcPort=50020, storageInfo=lv=-56;cid=CID-a99b693d-6f26-48fe-ad37-9f8162f70b22;nsid=920937379;c=0):Ignoring exception while serving BP-360285305-10.130.1.11-1444619256876:blk_1105510536_31776579 to /10.130.1.21:40188
原来是因为DFSInputStream在读取前做seeking导致的。如果InputStream被over read,则从target pos开始发起新的请求。