将 Cassandra 3.11 升级到 4.0,失败 "node with address ... already exists"

Upgraded Cassandra 3.11 to 4.0, failed with "node with address ... already exists"

我们尝试将apache cassandra 3.11.12升级到4.0.2,这是我们在这个集群中升级的第一个节点(种子节点)。 我们在替换版本之前耗尽节点并停止服务。

系统日志:

NFO  [RMI TCP Connection(16)-IP] 2022-03-03 15:50:18,811 StorageService.java:1568 - DRAINED
....
....
INFO  [main] 2022-03-03 15:58:02,970 QueryProcessor.java:167 - Preloaded 0 prepared statements
INFO  [main] 2022-03-03 15:58:02,970 StorageService.java:735 - Cassandra version: 4.0.2
INFO  [main] 2022-03-03 15:58:02,971 StorageService.java:736 - CQL version: 3.4.5
INFO  [main] 2022-03-03 15:58:02,971 StorageService.java:737 - Native protocol supported versions: 3/v3, 4/v4, 5/v5, 6/v6-beta (default: 5/v5)
...
...
WARN  [main] 2022-03-03 15:58:03,328 SystemKeyspace.java:1130 - No host ID found, created d78ab047-f1f9-4a07-8118-2fa83f4571ef (Note: This should happen exactly once per node).
....
...
ERROR [main] 2022-03-03 15:58:04,543 CassandraDaemon.java:911 - Exception encountered during startup
java.lang.RuntimeException: A node with address /HOST_IP:7001 already exists, cancelling join. Use cassandra.replace_address if you want to replace this node.
        at org.apache.cassandra.service.StorageService.checkForEndpointCollision(StorageService.java:660)
        at org.apache.cassandra.service.StorageService.prepareToJoin(StorageService.java:935)
        at org.apache.cassandra.service.StorageService.initServer(StorageService.java:785)
        at org.apache.cassandra.service.StorageService.initServer(StorageService.java:730)
        at org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:420)
        at org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:765)
        at org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:889)
INFO  [StorageServiceShutdownHook] 2022-03-03 15:58:04,558 HintsService.java:222 - Paused hints dispatch
WARN  [StorageServiceShutdownHook] 2022-03-03 15:58:04,561 Gossiper.java:2032 - No local state, state is in silent shutdown, or node hasn't joined, not announcing shutdown
INFO  [StorageServiceShutdownHook] 2022-03-03 15:58:04,561 MessagingService.java:441 - Waiting for messaging service to quiesce
...
..
INFO  [StorageServiceShutdownHook] 2022-03-03 15:58:06,956 HintsService.java:222 - Paused hints dispatch

在启动新的 cassandra 版本之前,我们是否需要 delete\rm -rf system* 数据目录?我们如何解决这个问题?

在启动期间,Cassandra 尝试通过查询本地系统 table 来检索主机 ID:

SELECT host_id FROM system.local WHERE key = 'local'

但是如果 system.local table 为空或者 system/local-*/ 数据子目录中缺少 SSTables,Cassandra 会认为它是一个全新的节点并分配一个新的主机 ID。但是,在您的情况下,Cassandra 意识到具有相同 IP 地址的另一个节点在与其他节点闲聊时已经是集群的一部分。

你需要搞清楚为什么Cassandra不能访问本地system.localtable。如果有人从数据目录中删除了 system/local-*/,那么您将无法再次启动该节点。如果是这种情况,您需要从头开始,包括:

  • 擦除data/commitlog/saved_caches/
  • 的所有内容
  • 卸载 C* 4.0
  • 重新安装 C* 3.11

然后您需要使用 the replace_address method 将节点“替换为自身”。干杯!