kafka + Leader none + 和 kafka broker id 未在 zookeeper 中签名

kafka + Leader none + and kafka broker id not signed in zookeeper

我们在 Linux RHEL 7.6 上有 3 个 Kafka 代理(3 linux 机器)

kafka版本为2.7.X

经纪人 ID 是 - 1010,1011,1012

从kafka的描述我们可以看到如下

 Topic: __consumer_offsets       Partition: 0    Leader: none    Replicas: 1011,1010,1012        Isr: 1010
        Topic: __consumer_offsets       Partition: 1    Leader: 1012    Replicas: 1012,1011,1010        Isr: 1012,1011
        Topic: __consumer_offsets       Partition: 2    Leader: 1011    Replicas: 1010,1012,1011        Isr: 1011,1012
        Topic: __consumer_offsets       Partition: 3    Leader: none    Replicas: 1011,1012,1010        Isr: 1010
        Topic: __consumer_offsets       Partition: 4    Leader: 1011    Replicas: 1012,1010,1011        Isr: 1011
        Topic: __consumer_offsets       Partition: 5    Leader: none    Replicas: 1010,1011,1012        Isr: 1010

从 Zookeeper cli 我们可以看到代理 id 1010 未定义

[zk: localhost:2181(CONNECTED) 10] ls /brokers/ids
[1011, 1012]

并从日志中 - state-change.log

我们可以看到以下内容

[2021-12-16 14:15:36,170] WARN [Broker id=1010] Ignoring LeaderAndIsr request from controller 1010 with correlation id 485 epoch 323 for partition __consumer_offsets-6 as the local replica for the partition is in an offline log directory (state.change.logger)
[2021-12-16 14:15:36,170] WARN [Broker id=1010] Ignoring LeaderAndIsr request from controller 1010 with correlation id 485 epoch 323 for partition __consumer_offsets-9 as the local replica for the partition is in an offline log directory (state.change.logger)
[2021-12-16 14:15:36,170] WARN [Broker id=1010] Ignoring LeaderAndIsr request from controller 1010 with correlation id 485 epoch 323 for partition __consumer_offsets-8 as the local replica for the partition is in an offline log directory (state.change.logger)
[2021-12-16 14:15:36,170] WARN [Broker id=1010] Ignoring LeaderAndIsr request from controller 1010 with correlation id 485 epoch 323 for partition __consumer_offsets-11 as the local replica for the partition is in an offline log directory (state.change.logger)
[2021-12-16 14:15:36,170] WARN [Broker id=1010] Ignoring LeaderAndIsr request from controller 1010 with correlation id 485 epoch 323 for partition __consumer_offsets-10 as the local replica for the partition is in an offline log directory (state.change.logger)
[2021-12-16 14:15:36,170] WARN [Broker id=1010] Ignoring LeaderAndIsr request from controller 1010 with correlation id 485 epoch 323 for partition __consumer_offsets-46 as the local replica for the partition is in an offline log directory (state.change.logger)
[2021-12-16 14:15:36,170] WARN [Broker id=1010] Ignoring LeaderAndIsr request from controller 1010 with correlation id 485 epoch 323 for partition __consumer_offsets-45 as the local replica for the partition is in an offline log directory (state.change.logger)
[2021-12-16 14:15:36,170] WARN [Broker id=1010] Ignoring LeaderAndIsr request from controller 1010 with correlation id 485 epoch 323 for partition __consumer_offsets-48 as the local replica for the partition is in an offline log directory (state.change.logger)
[2021-12-16 14:15:36,170] WARN [Broker id=1010] Ignoring LeaderAndIsr request from controller 1010 with correlation id 485 epoch 323 for partition __consumer_offsets-47 as the local replica for the partition is in an offline log directory (state.change.logger)
[2021-12-16 14:15:36,170] WARN [Broker id=1010] Ignoring LeaderAndIsr request from controller 1010 with correlation id 485 epoch 323 for partition __consumer_offsets-49 as the local replica for the partition is in an offline log directory (state.change.logger)

通过 ls -ltr ,我们可以看到 controller.logstate-change.log 不是从 Dec 16

更新的
-rwxr-xr-x 1 root kafka 343477146 Dec 16 14:15 controller.log
-rwxr-xr-x 1 root kafka 207911766 Dec 16 14:15 state-change.log
-rw-r--r-- 1 root kafka  68759461 Dec 16 14:15 kafkaServer-gc.log.6.current
-rwxr-xr-x 1 root kafka   6570543 Dec 17 09:42 log-cleaner.log
-rw-r--r-- 1 root kafka 524288242 Dec 20 00:39 server.log.10
-rw-r--r-- 1 root kafka 524289332 Dec 20 01:37 server.log.9
-rw-r--r-- 1 root kafka 524288452 Dec 20 02:35 server.log.8
-rw-r--r-- 1 root kafka 524288625 Dec 20 03:33 server.log.7
-rw-r--r-- 1 root kafka 524288395 Dec 20 04:30 server.log.6
-rw-r--r-- 1 root kafka 524288237 Dec 20 05:27 server.log.5
-rw-r--r-- 1 root kafka 524289136 Dec 20 06:25 server.log.4
-rw-r--r-- 1 root kafka 524288142 Dec 20 07:25 server.log.3
-rw-r--r-- 1 root kafka 524288187 Dec 20 08:21 server.log.2
-rw-r--r-- 1 root kafka 524288094 Dec 20 10:52 server.log.1
-rw-r--r-- 1 root kafka    323361 Dec 20 19:50 kafkaServer-gc.log.0.current
-rw-r--r-- 1 root kafka 323132219 Dec 20 19:50 server.log
-rwxr-xr-x 1 root kafka  15669106 Dec 20 19:50 kafkaServer.out

到目前为止我们所做的是:

我们重新启动所有 3 个动物园管理员服务器 我们重启所有 kafka 代理

但 kafka broker 1010 仍然显示为 leader none ,而不是在 zookeeper 数据中

补充信息

[zk: localhost:2181(CONNECTED) 11] get /controller
{"version":1,"brokerid":1011,"timestamp":"1640003679634"}
cZxid = 0x4900000b0c
ctime = Mon Dec 20 12:34:39 UTC 2021
mZxid = 0x4900000b0c
mtime = Mon Dec 20 12:34:39 UTC 2021
pZxid = 0x4900000b0c
cversion = 0
dataVersion = 0
aclVersion = 0
ephemeralOwner = 0x27dd7cf43350080
dataLength = 57
numChildren = 0

来自kafka01

more meta.properties
#
#Tue Nov 16 07:45:36 UTC 2021
cluster.id=D3KpekCETmaNveBJzE6PZg
version=0
broker.id=1010

相关想法

在主题磁盘中,我们有以下文件(除了主题分区外)

-rw-r--r-- 1 root kafka    91 Nov 16 07:45 meta.properties
-rw-r--r-- 1 root kafka   161 Dec 15 16:04 cleaner-offset-checkpoint
-rw-r--r-- 1 root kafka 13010 Dec 15 16:20 replication-offset-checkpoint
-rw-r--r-- 1 root kafka  1928 Dec 17 09:42 recovery-point-offset-checkpoint
-rw-r--r-- 1 root kafka    80 Dec 17 09:42 log-start-offset-checkpoint

知道删除上述一个或多个文件是否有助于解决我们的问题吗?

您所显示的只是代理 1010 不健康,您可能禁用了不清楚的领导者选举。

ls /brokers/ids 从 Zookeeper 的角度向您展示 运行 健康的代理。

同时,/topics znode 中的数据指的是副本集中列出的代理,它不是 运行,或者至少不向 Zookeeper 报告,您将在server.log

如果你有另一个代理,你可以使用分区重新分配工具手动remove/change从它托管的所有主题的每个分区代理 1010 数据,这将删除 Zookeeper 中的旧副本信息,并且应该强制新领袖

您不应删除检查点文件,但您可以在确定不再需要旧的、轮换的日志文件后删除它们