需要对运行 Cassandra nodetool 修复进行一些说明

Need some clarification on running Cassandra nodetool repairs

因此，我们一直无法平衡当前集群上的工作负载，这主要是由于预算限制和此时无法添加更多节点。直到最近，节点经常在一夜之间宕机，所以我经常运行ning nodetool 修复。最近集群变得更加稳定，这些宕机节点不会经常发生，所以上周末我为每个节点上的 nodetool repair -pr 创建了 cron 作业，每周运行。 gc_grace仍为默认10天，最大提示仍为默认3小时。

我的问题是：

如果我们失去一个节点超过 3 小时，hint/s 会发生什么？ it/they不存在了吗？
如果我们丢失一个节点的时间超过 3 小时，但由于某种原因没有意识到该节点已经关闭了那么久，如果 nodetool repair -pr 是运行而不是对宕机节点进行全面修复？
如果确实如此，您将如何解决问题 2 中的 issue/s？
有没有办法检查所有节点是否显着consistent/repaired？

这还没有发生（至少我不这么认为），但我正在尝试为最坏的情况提前计划，因为我们的集群稳定性可能会或可能不会失去长期，所以我宁愿尽我所能做好准备。

1) If we lose a node for longer than 3 hours, what exactly happens to the hint/s? Does it/they no longer exist?

是的，没错，您的提示将被删除（逻辑删除），并且它们将通过常规压缩过程消失。您实际上可以自己看到这一点，只需 select 来自 system.hints table.

查看我们的docs and Jonathan's blog post on HH。

2) If we lost a node for longer than the 3 hours but for some reason didn't realize that the node had been down that long, what will happen if the nodetool repair -pr is run rather than the full repair on the downed node?

在该节点恢复和您的运行修复之间的时间段内，您可能正在保存过时的数据。

-pr 表示您只需修复该机器上的主要范围。如果您运行在整个集群中使用 -pr 进行修复，您仍然会修复所有内容。

与其使用 chron，我建议您尝试 OpsCenter repair service 来自动执行此过程。

3) How would you fix the issue/s from question 2 if that is in fact the case?

修复会让您回到完全一致的基线，这就是为什么您应该每周运行（或 < gc_grace）。

4) Is there a way to check that all nodes are significantly consistent/repaired?

唯一的办法是建造默克尔树，这就是修复所做的。一旦发现不一致，您不妨进行修复。没法只比不修。

注意：3.0 中的改进提示不错，请查看 Aleksey 的 post： http://www.datastax.com/dev/blog/whats-coming-to-cassandra-in-3-0-improved-hint-storage-and-delivery

需要对运行 Cassandra nodetool 修复进行一些说明

Need some clarification on running Cassandra nodetool repairs

datastax-enterprise

nodetool

cassandra-2.0

需要对 运行 Cassandra nodetool 修复进行一些说明

Need some clarification on running Cassandra nodetool repairs

datastax-enterprise

nodetool

cassandra-2.0

需要对运行 Cassandra nodetool 修复进行一些说明