一台 DC 停止同步。日志错误

One DC stopped syncronizing. Error in logs

我们开始在日志中看到此错误。同时,DC“datacenter_spark”中唯一的节点停止同步到 DC“datacenter-prod”。

错误消息中的列指向我们拥有的 table,但比较两个 DC 上的节点它具有相同的列。

是什么导致了这个问题,如何解决?

错误:

2021-07-20 07:54:03,927 ERROR [ReadStage-1] AbstractLocalAwareExecutorService.java:169 run Uncaught exception on thread Thread[ReadStage-1,5,main]
java.lang.IllegalStateException: [color, icon_image_file, name, type] is not a subset of [icon_image_file name type]
        at org.apache.cassandra.db.Columns$Serializer.encodeBitmap(Columns.java:565)
        at org.apache.cassandra.db.Columns$Serializer.serializeSubset(Columns.java:497)
        at org.apache.cassandra.db.rows.UnfilteredSerializer.serializeRowBody(UnfilteredSerializer.java:230)
        at org.apache.cassandra.db.rows.UnfilteredSerializer.serialize(UnfilteredSerializer.java:205)
        at org.apache.cassandra.db.rows.UnfilteredSerializer.serialize(UnfilteredSerializer.java:137)
        at org.apache.cassandra.db.rows.UnfilteredSerializer.serialize(UnfilteredSerializer.java:125)
        at org.apache.cassandra.db.rows.UnfilteredRowIteratorSerializer.serialize(UnfilteredRowIteratorSerializer.java:137)
        at org.apache.cassandra.db.rows.UnfilteredRowIteratorSerializer.serialize(UnfilteredRowIteratorSerializer.java:92)
        at org.apache.cassandra.db.rows.UnfilteredRowIteratorSerializer.serialize(UnfilteredRowIteratorSerializer.java:79)
        at org.apache.cassandra.db.partitions.UnfilteredPartitionIterators$Serializer.serialize(UnfilteredPartitionIterators.java:307)
        at org.apache.cassandra.db.ReadResponse$LocalDataResponse.build(ReadResponse.java:187)
        at org.apache.cassandra.db.ReadResponse$LocalDataResponse.<init>(ReadResponse.java:180)
        at org.apache.cassandra.db.ReadResponse$LocalDataResponse.<init>(ReadResponse.java:176)
        at org.apache.cassandra.db.ReadResponse.createDataResponse(ReadResponse.java:76)
        at org.apache.cassandra.db.ReadCommand.createResponse(ReadCommand.java:353)
        at org.apache.cassandra.db.ReadCommandVerbHandler.doVerb(ReadCommandVerbHandler.java:50)
        at org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:66)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$FutureTask.run(AbstractLocalAwareExecutorService.java:165)
        at org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$LocalSessionFutureTask.run(AbstractLocalAwareExecutorService.java:137)
        at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:113)
        at java.lang.Thread.run(Thread.java:748)

节点工具状态:

Datacenter: datacenter-prod
===========================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address       Load       Tokens       Owns (effective)  Host ID                               Rack
UN  10.164.0.23   143.14 GiB  256          100.0%            e7e2a38a-d4f3-4758-a345-73fcffe26035  rack1
UN  10.164.0.24   146.79 GiB  256          100.0%            0c18b8e4-5ca2-4fb5-9e8c-663b74909fbb  rack1
UN  10.164.0.58   151.03 GiB  256          100.0%            547c0746-72a8-4fec-812a-8b926d2426ae  rack1
Datacenter: datacenter_spark
============================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address       Load       Tokens       Owns (effective)  Host ID                               Rack
UN  172.16.0.179  140.57 GiB  256          100.0%            790cef99-9234-4b2d-8389-c4407ed8cb9b  rack1

该错误表明 table 架构与磁盘上的数据不匹配。

您需要检查集群中所有节点的模式协议。如果您最近更改了架构,请检查当前的 table 定义与磁盘上的内容。

一种可能是当您尝试删除列时出现问题,并且条目从 system_schema.dropped_columns table 中丢失。如果是这种情况,您将需要暂时将该列添加回具有相同 CQL 数据类型的 table,然后删除该列。干杯!