将 Cassandra 升级到 3.11.3 后出现 code=1200 错误

Question

我刚刚继承了一个有 3 个节点的系统，2 个在一个数据中心，复制因子为 2，1 个在第二个数据中心，复制因子为 1。系统从 Cassandra 3.9 升级到 Cassandra 3.11。 3.由于升级 cqlsh 中的任何查询 return 错误

ReadTimeout: Error from server: code=1200 [Coordinator node timed out waiting for replica nodes' responses] message="Operation timed out - received only 0 responses." info={'received_responses': 0, 'required_responses': 2, 'consistency': 'LOCAL_QUORUM'}

任何人都可以建议是什么导致了这个问题，或者我应该去哪里寻找问题？

编辑：我以 1 的一致性重试了我的查询，但仍然收到错误

ReadTimeout: Error from server: code=1200 [Coordinator node timed out waiting for replica nodes' responses] message="Operation timed out - received only 0 responses." info={'received_responses': 0, 'required_responses': 1, 'consistency': 'ONE'}

Answer 1

评论时间太长...

有几件事可能会导致这种情况。

1 - 最大的分区有多大？ 我将使用以下内容进行检查：

bin/nodetool tablestats yourKeyspaceName.ablog | grep "partition maximum"

如果返回的结果在两位数 GB 范围内，那你就有麻烦了。

2 - 是否有墓碑？ 您可以使用类似的命令检查：

bin/nodetool tablestats yourKeyspaceName.ablog | grep "tombstones"

如果返回的数字是 3 位或 4 位数字，那可能是个问题。

3 - 降级到 3.11.2。 3.11.2 和 3.11.3 使用相同的 SSTable 格式。这只是切换二进制文件的问题。 Download/untar3.11.2，把3.11.3目录下的conf目录移进去，应该就可以了

我只建议这样做，因为你可能运行变成 CASSANDRA-14672。

4 - LOCAL_QUORUM w/RF=2 正如我在评论中提到的，在 LOCAL_QUORUM 处使用 RF < 3 进行查询不是'与完全查询没有任何不同。 Cassandra 按如下方式计算法定人数（多数）：

QUORUM = (RF / 2) + 1 = (2 / 2) + 1 = 2（副本需要响应）

说真的，这样做你什么都不会。只有当您的 RF 为 3 或更多时才有意义：

QUORUM = (RF / 2) + 1 = (3 / 2) + 1 = 2（副本需要响应）

实际上，在 QUORUM 中使用 RF=2 进行查询会伤害您，因为您不能容忍单个节点出现故障。

将 Cassandra 升级到 3.11.3 后出现 code=1200 错误

code=1200 error after upgrading Cassandra to 3.11.3

cassandra

cassandra-3.0