kafka + Leader none + 和 kafka broker id 未在 zookeeper 中签名
kafka + Leader none + and kafka broker id not signed in zookeeper
我们在 Linux RHEL 7.6 上有 3 个 Kafka 代理(3 linux 机器)
kafka版本为2.7.X
经纪人 ID 是 - 1010,1011,1012
从kafka的描述我们可以看到如下
Topic: __consumer_offsets Partition: 0 Leader: none Replicas: 1011,1010,1012 Isr: 1010
Topic: __consumer_offsets Partition: 1 Leader: 1012 Replicas: 1012,1011,1010 Isr: 1012,1011
Topic: __consumer_offsets Partition: 2 Leader: 1011 Replicas: 1010,1012,1011 Isr: 1011,1012
Topic: __consumer_offsets Partition: 3 Leader: none Replicas: 1011,1012,1010 Isr: 1010
Topic: __consumer_offsets Partition: 4 Leader: 1011 Replicas: 1012,1010,1011 Isr: 1011
Topic: __consumer_offsets Partition: 5 Leader: none Replicas: 1010,1011,1012 Isr: 1010
从 Zookeeper cli 我们可以看到代理 id 1010
未定义
[zk: localhost:2181(CONNECTED) 10] ls /brokers/ids
[1011, 1012]
并从日志中 - state-change.log
我们可以看到以下内容
[2021-12-16 14:15:36,170] WARN [Broker id=1010] Ignoring LeaderAndIsr request from controller 1010 with correlation id 485 epoch 323 for partition __consumer_offsets-6 as the local replica for the partition is in an offline log directory (state.change.logger)
[2021-12-16 14:15:36,170] WARN [Broker id=1010] Ignoring LeaderAndIsr request from controller 1010 with correlation id 485 epoch 323 for partition __consumer_offsets-9 as the local replica for the partition is in an offline log directory (state.change.logger)
[2021-12-16 14:15:36,170] WARN [Broker id=1010] Ignoring LeaderAndIsr request from controller 1010 with correlation id 485 epoch 323 for partition __consumer_offsets-8 as the local replica for the partition is in an offline log directory (state.change.logger)
[2021-12-16 14:15:36,170] WARN [Broker id=1010] Ignoring LeaderAndIsr request from controller 1010 with correlation id 485 epoch 323 for partition __consumer_offsets-11 as the local replica for the partition is in an offline log directory (state.change.logger)
[2021-12-16 14:15:36,170] WARN [Broker id=1010] Ignoring LeaderAndIsr request from controller 1010 with correlation id 485 epoch 323 for partition __consumer_offsets-10 as the local replica for the partition is in an offline log directory (state.change.logger)
[2021-12-16 14:15:36,170] WARN [Broker id=1010] Ignoring LeaderAndIsr request from controller 1010 with correlation id 485 epoch 323 for partition __consumer_offsets-46 as the local replica for the partition is in an offline log directory (state.change.logger)
[2021-12-16 14:15:36,170] WARN [Broker id=1010] Ignoring LeaderAndIsr request from controller 1010 with correlation id 485 epoch 323 for partition __consumer_offsets-45 as the local replica for the partition is in an offline log directory (state.change.logger)
[2021-12-16 14:15:36,170] WARN [Broker id=1010] Ignoring LeaderAndIsr request from controller 1010 with correlation id 485 epoch 323 for partition __consumer_offsets-48 as the local replica for the partition is in an offline log directory (state.change.logger)
[2021-12-16 14:15:36,170] WARN [Broker id=1010] Ignoring LeaderAndIsr request from controller 1010 with correlation id 485 epoch 323 for partition __consumer_offsets-47 as the local replica for the partition is in an offline log directory (state.change.logger)
[2021-12-16 14:15:36,170] WARN [Broker id=1010] Ignoring LeaderAndIsr request from controller 1010 with correlation id 485 epoch 323 for partition __consumer_offsets-49 as the local replica for the partition is in an offline log directory (state.change.logger)
通过 ls -ltr ,我们可以看到 controller.log
和 state-change.log
不是从 Dec 16
更新的
-rwxr-xr-x 1 root kafka 343477146 Dec 16 14:15 controller.log
-rwxr-xr-x 1 root kafka 207911766 Dec 16 14:15 state-change.log
-rw-r--r-- 1 root kafka 68759461 Dec 16 14:15 kafkaServer-gc.log.6.current
-rwxr-xr-x 1 root kafka 6570543 Dec 17 09:42 log-cleaner.log
-rw-r--r-- 1 root kafka 524288242 Dec 20 00:39 server.log.10
-rw-r--r-- 1 root kafka 524289332 Dec 20 01:37 server.log.9
-rw-r--r-- 1 root kafka 524288452 Dec 20 02:35 server.log.8
-rw-r--r-- 1 root kafka 524288625 Dec 20 03:33 server.log.7
-rw-r--r-- 1 root kafka 524288395 Dec 20 04:30 server.log.6
-rw-r--r-- 1 root kafka 524288237 Dec 20 05:27 server.log.5
-rw-r--r-- 1 root kafka 524289136 Dec 20 06:25 server.log.4
-rw-r--r-- 1 root kafka 524288142 Dec 20 07:25 server.log.3
-rw-r--r-- 1 root kafka 524288187 Dec 20 08:21 server.log.2
-rw-r--r-- 1 root kafka 524288094 Dec 20 10:52 server.log.1
-rw-r--r-- 1 root kafka 323361 Dec 20 19:50 kafkaServer-gc.log.0.current
-rw-r--r-- 1 root kafka 323132219 Dec 20 19:50 server.log
-rwxr-xr-x 1 root kafka 15669106 Dec 20 19:50 kafkaServer.out
到目前为止我们所做的是:
我们重新启动所有 3 个动物园管理员服务器
我们重启所有 kafka 代理
但 kafka broker 1010
仍然显示为 leader none
,而不是在 zookeeper 数据中
补充信息
[zk: localhost:2181(CONNECTED) 11] get /controller
{"version":1,"brokerid":1011,"timestamp":"1640003679634"}
cZxid = 0x4900000b0c
ctime = Mon Dec 20 12:34:39 UTC 2021
mZxid = 0x4900000b0c
mtime = Mon Dec 20 12:34:39 UTC 2021
pZxid = 0x4900000b0c
cversion = 0
dataVersion = 0
aclVersion = 0
ephemeralOwner = 0x27dd7cf43350080
dataLength = 57
numChildren = 0
来自kafka01
more meta.properties
#
#Tue Nov 16 07:45:36 UTC 2021
cluster.id=D3KpekCETmaNveBJzE6PZg
version=0
broker.id=1010
相关想法
在主题磁盘中,我们有以下文件(除了主题分区外)
-rw-r--r-- 1 root kafka 91 Nov 16 07:45 meta.properties
-rw-r--r-- 1 root kafka 161 Dec 15 16:04 cleaner-offset-checkpoint
-rw-r--r-- 1 root kafka 13010 Dec 15 16:20 replication-offset-checkpoint
-rw-r--r-- 1 root kafka 1928 Dec 17 09:42 recovery-point-offset-checkpoint
-rw-r--r-- 1 root kafka 80 Dec 17 09:42 log-start-offset-checkpoint
知道删除上述一个或多个文件是否有助于解决我们的问题吗?
您所显示的只是代理 1010 不健康,您可能禁用了不清楚的领导者选举。
ls /brokers/ids
从 Zookeeper 的角度向您展示 运行 健康的代理。
同时,/topics
znode 中的数据指的是副本集中列出的代理,它不是 运行,或者至少不向 Zookeeper 报告,您将在server.log
如果你有另一个代理,你可以使用分区重新分配工具手动remove/change从它托管的所有主题的每个分区代理 1010 数据,这将删除 Zookeeper 中的旧副本信息,并且应该强制新领袖
您不应删除检查点文件,但您可以在确定不再需要旧的、轮换的日志文件后删除它们
我们在 Linux RHEL 7.6 上有 3 个 Kafka 代理(3 linux 机器)
kafka版本为2.7.X
经纪人 ID 是 - 1010,1011,1012
从kafka的描述我们可以看到如下
Topic: __consumer_offsets Partition: 0 Leader: none Replicas: 1011,1010,1012 Isr: 1010
Topic: __consumer_offsets Partition: 1 Leader: 1012 Replicas: 1012,1011,1010 Isr: 1012,1011
Topic: __consumer_offsets Partition: 2 Leader: 1011 Replicas: 1010,1012,1011 Isr: 1011,1012
Topic: __consumer_offsets Partition: 3 Leader: none Replicas: 1011,1012,1010 Isr: 1010
Topic: __consumer_offsets Partition: 4 Leader: 1011 Replicas: 1012,1010,1011 Isr: 1011
Topic: __consumer_offsets Partition: 5 Leader: none Replicas: 1010,1011,1012 Isr: 1010
从 Zookeeper cli 我们可以看到代理 id 1010
未定义
[zk: localhost:2181(CONNECTED) 10] ls /brokers/ids
[1011, 1012]
并从日志中 - state-change.log
我们可以看到以下内容
[2021-12-16 14:15:36,170] WARN [Broker id=1010] Ignoring LeaderAndIsr request from controller 1010 with correlation id 485 epoch 323 for partition __consumer_offsets-6 as the local replica for the partition is in an offline log directory (state.change.logger)
[2021-12-16 14:15:36,170] WARN [Broker id=1010] Ignoring LeaderAndIsr request from controller 1010 with correlation id 485 epoch 323 for partition __consumer_offsets-9 as the local replica for the partition is in an offline log directory (state.change.logger)
[2021-12-16 14:15:36,170] WARN [Broker id=1010] Ignoring LeaderAndIsr request from controller 1010 with correlation id 485 epoch 323 for partition __consumer_offsets-8 as the local replica for the partition is in an offline log directory (state.change.logger)
[2021-12-16 14:15:36,170] WARN [Broker id=1010] Ignoring LeaderAndIsr request from controller 1010 with correlation id 485 epoch 323 for partition __consumer_offsets-11 as the local replica for the partition is in an offline log directory (state.change.logger)
[2021-12-16 14:15:36,170] WARN [Broker id=1010] Ignoring LeaderAndIsr request from controller 1010 with correlation id 485 epoch 323 for partition __consumer_offsets-10 as the local replica for the partition is in an offline log directory (state.change.logger)
[2021-12-16 14:15:36,170] WARN [Broker id=1010] Ignoring LeaderAndIsr request from controller 1010 with correlation id 485 epoch 323 for partition __consumer_offsets-46 as the local replica for the partition is in an offline log directory (state.change.logger)
[2021-12-16 14:15:36,170] WARN [Broker id=1010] Ignoring LeaderAndIsr request from controller 1010 with correlation id 485 epoch 323 for partition __consumer_offsets-45 as the local replica for the partition is in an offline log directory (state.change.logger)
[2021-12-16 14:15:36,170] WARN [Broker id=1010] Ignoring LeaderAndIsr request from controller 1010 with correlation id 485 epoch 323 for partition __consumer_offsets-48 as the local replica for the partition is in an offline log directory (state.change.logger)
[2021-12-16 14:15:36,170] WARN [Broker id=1010] Ignoring LeaderAndIsr request from controller 1010 with correlation id 485 epoch 323 for partition __consumer_offsets-47 as the local replica for the partition is in an offline log directory (state.change.logger)
[2021-12-16 14:15:36,170] WARN [Broker id=1010] Ignoring LeaderAndIsr request from controller 1010 with correlation id 485 epoch 323 for partition __consumer_offsets-49 as the local replica for the partition is in an offline log directory (state.change.logger)
通过 ls -ltr ,我们可以看到 controller.log
和 state-change.log
不是从 Dec 16
-rwxr-xr-x 1 root kafka 343477146 Dec 16 14:15 controller.log
-rwxr-xr-x 1 root kafka 207911766 Dec 16 14:15 state-change.log
-rw-r--r-- 1 root kafka 68759461 Dec 16 14:15 kafkaServer-gc.log.6.current
-rwxr-xr-x 1 root kafka 6570543 Dec 17 09:42 log-cleaner.log
-rw-r--r-- 1 root kafka 524288242 Dec 20 00:39 server.log.10
-rw-r--r-- 1 root kafka 524289332 Dec 20 01:37 server.log.9
-rw-r--r-- 1 root kafka 524288452 Dec 20 02:35 server.log.8
-rw-r--r-- 1 root kafka 524288625 Dec 20 03:33 server.log.7
-rw-r--r-- 1 root kafka 524288395 Dec 20 04:30 server.log.6
-rw-r--r-- 1 root kafka 524288237 Dec 20 05:27 server.log.5
-rw-r--r-- 1 root kafka 524289136 Dec 20 06:25 server.log.4
-rw-r--r-- 1 root kafka 524288142 Dec 20 07:25 server.log.3
-rw-r--r-- 1 root kafka 524288187 Dec 20 08:21 server.log.2
-rw-r--r-- 1 root kafka 524288094 Dec 20 10:52 server.log.1
-rw-r--r-- 1 root kafka 323361 Dec 20 19:50 kafkaServer-gc.log.0.current
-rw-r--r-- 1 root kafka 323132219 Dec 20 19:50 server.log
-rwxr-xr-x 1 root kafka 15669106 Dec 20 19:50 kafkaServer.out
到目前为止我们所做的是:
我们重新启动所有 3 个动物园管理员服务器 我们重启所有 kafka 代理
但 kafka broker 1010
仍然显示为 leader none
,而不是在 zookeeper 数据中
补充信息
[zk: localhost:2181(CONNECTED) 11] get /controller
{"version":1,"brokerid":1011,"timestamp":"1640003679634"}
cZxid = 0x4900000b0c
ctime = Mon Dec 20 12:34:39 UTC 2021
mZxid = 0x4900000b0c
mtime = Mon Dec 20 12:34:39 UTC 2021
pZxid = 0x4900000b0c
cversion = 0
dataVersion = 0
aclVersion = 0
ephemeralOwner = 0x27dd7cf43350080
dataLength = 57
numChildren = 0
来自kafka01
more meta.properties
#
#Tue Nov 16 07:45:36 UTC 2021
cluster.id=D3KpekCETmaNveBJzE6PZg
version=0
broker.id=1010
相关想法
在主题磁盘中,我们有以下文件(除了主题分区外)
-rw-r--r-- 1 root kafka 91 Nov 16 07:45 meta.properties
-rw-r--r-- 1 root kafka 161 Dec 15 16:04 cleaner-offset-checkpoint
-rw-r--r-- 1 root kafka 13010 Dec 15 16:20 replication-offset-checkpoint
-rw-r--r-- 1 root kafka 1928 Dec 17 09:42 recovery-point-offset-checkpoint
-rw-r--r-- 1 root kafka 80 Dec 17 09:42 log-start-offset-checkpoint
知道删除上述一个或多个文件是否有助于解决我们的问题吗?
您所显示的只是代理 1010 不健康,您可能禁用了不清楚的领导者选举。
ls /brokers/ids
从 Zookeeper 的角度向您展示 运行 健康的代理。
同时,/topics
znode 中的数据指的是副本集中列出的代理,它不是 运行,或者至少不向 Zookeeper 报告,您将在server.log
如果你有另一个代理,你可以使用分区重新分配工具手动remove/change从它托管的所有主题的每个分区代理 1010 数据,这将删除 Zookeeper 中的旧副本信息,并且应该强制新领袖
您不应删除检查点文件,但您可以在确定不再需要旧的、轮换的日志文件后删除它们