Hadoop HDFS - 缺少副本和复制块不足之间的区别

Hadoop HDFS - Difference between Missing replica and Under replicated blocks

我知道复制不足的块和错误复制的块都是由于相对于复制因子集的数据节点数较少而发生的。

但是它们有什么区别呢?

在可用数据节点为 1 的情况下将复制因子重新设置为 1,复制不足的块和丢失的副本错误都已清除。通过执行命令 hdfs fsck / 确保这一点

来自 Tom White 的“Hadoop:权威指南”:

Over-replicated blocks These are blocks that exceed their target replication for the file they belong to. Normally, over-replication is not a problem, and HDFS will automatically delete excess replicas.

Under-replicated blocks These are blocks that do not meet their target replication for the file they belong to. HDFS will automatically create new replicas of under-replicated blocks until they meet the target replication. You can get information about the blocks being replicated (or waiting to be replicated) using hdfs dfsadmin -metasave .

Misreplicated blocks These are blocks that do not satisfy the block replica placement policy (see Replica Placement). For example, for a replication level of three in a multirack cluster, if all three replicas of a block are on the same rack, then the block is misreplicated because the replicas should be spread across at least two racks for resilience. HDFS will automatically re-replicate misreplicated blocks so that they satisfy the rack placement policy.

Corrupt blocks These are blocks whose replicas are all corrupt. Blocks with at least one noncorrupt replica are not reported as corrupt; the namenode will replicate the noncorrupt replica until the target replication is met.

Missing replicas These are blocks with no replicas anywhere in the cluster.

希望这能回答您的问题。