Hadoop 的 HDFS 高可用性特性如何影响 CAP 定理?
How does Hadoop's HDFS High Availability feature affects the CAP Theorem?
根据我目前所读到的有关 CAP 定理的所有内容,没有分布式系统可以同时提供以下三项:可用性、一致性和分区容错性。
现在,Hadoop 2.x 引入了一项新功能,可以对其进行配置以消除 hadoop 集群所具有的单点故障(单个名称节点)。这样,集群就变得高度可用、一致且具有分区容错性。
我对吗?或者我错过了什么?根据 CAP 的说法,如果系统试图提供所有这三个功能,它应该在延迟方面付出代价,新功能是否将这种延迟添加到集群中?还是 Hadoop 破解了 CAP 定理?
HDFS 在多个相关故障的情况下不提供可用性(例如,具有相同 HDFS 块的三个故障数据节点)。
来自CAP Confusion: Problems with partition tolerance
Systems such as ZooKeeper are explicitly sequentially consistent because there are few enough nodes in a cluster that the cost of writing to quorum is relatively small. The Hadoop Distributed File System (HDFS) also chooses consistency – three failed datanodes can render a file’s blocks unavailable if you are unlucky. Both systems are designed to work in real networks, however, where partitions and failures will occur, and when they do both systems will become unavailable, having made their choice between consistency and availability. That choice remains the unavoidable reality for distributed data stores.
HDFS 高可用性使 HDFS 更多 可用,但不完全可用。如果网络分区导致客户端无法与任一 NameNode 通信,则集群实际上不可用。
根据我目前所读到的有关 CAP 定理的所有内容,没有分布式系统可以同时提供以下三项:可用性、一致性和分区容错性。
现在,Hadoop 2.x 引入了一项新功能,可以对其进行配置以消除 hadoop 集群所具有的单点故障(单个名称节点)。这样,集群就变得高度可用、一致且具有分区容错性。 我对吗?或者我错过了什么?根据 CAP 的说法,如果系统试图提供所有这三个功能,它应该在延迟方面付出代价,新功能是否将这种延迟添加到集群中?还是 Hadoop 破解了 CAP 定理?
HDFS 在多个相关故障的情况下不提供可用性(例如,具有相同 HDFS 块的三个故障数据节点)。
来自CAP Confusion: Problems with partition tolerance
Systems such as ZooKeeper are explicitly sequentially consistent because there are few enough nodes in a cluster that the cost of writing to quorum is relatively small. The Hadoop Distributed File System (HDFS) also chooses consistency – three failed datanodes can render a file’s blocks unavailable if you are unlucky. Both systems are designed to work in real networks, however, where partitions and failures will occur, and when they do both systems will become unavailable, having made their choice between consistency and availability. That choice remains the unavoidable reality for distributed data stores.
HDFS 高可用性使 HDFS 更多 可用,但不完全可用。如果网络分区导致客户端无法与任一 NameNode 通信,则集群实际上不可用。