Zookeeper error: Cannot open channel to X at election address
Zookeeper error: Cannot open channel to X at election address
我已经在 3 个不同的 aws 服务器上安装了 zookeeper。以下是所有服务器中的配置
tickTime=2000
initLimit=10
syncLimit=5
dataDir=/var/zookeeper
clientPort=2181
server.1=x.x.x.x:2888:3888
server.2=x.x.x.x:2888:3888
server.3=x.x.x.x:2888:3888
所有三个实例在 var/zookeeper
处都有一个 myid
文件,其中包含适当的 ID。这三台服务器的所有端口都从 aws 控制台打开。但是当我 运行 动物园管理员服务器时,我在所有实例中都收到以下错误。
2015-06-19 12:09:22,989 [myid:1] - WARN [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:QuorumCnxManager@382]
- Cannot open channel to 2 at election address /x.x.x.x:3888
java.net.ConnectException: Connection refused
at java.net.PlainSocketImpl.socketConnect(Native Method)
at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339)
at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:200)
at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
at java.net.Socket.connect(Socket.java:579)
at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:368)
at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectAll(QuorumCnxManager.java:402)
at org.apache.zookeeper.server.quorum.FastLeaderElection.lookForLeader(FastLeaderElection.java:840)
at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:762)
2015-06-19 12:09:23,170 [myid:1] - WARN [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:QuorumCnxManager@382]
- Cannot open channel to 3 at election address /x.x.x.x:3888
java.net.ConnectException: Connection refused
at java.net.PlainSocketImpl.socketConnect(Native Method)
at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339)
at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:200)
at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
at java.net.Socket.connect(Socket.java:579)
at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:368)
at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectAll(QuorumCnxManager.java:402)
at org.apache.zookeeper.server.quorum.FastLeaderElection.lookForLeader(FastLeaderElection.java:840)
at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:762)
2015-06-19 12:09:23,170 [myid:1] - INFO [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@849] - Notification time out: 25600
如何在每个节点定义本地服务器的ip?如果您提供了 public ip,则侦听器将无法连接到该端口。您必须为当前节点指定 0.0.0.0
server.1=0.0.0.0:2888:3888
server.2=192.168.10.10:2888:3888
server.3=192.168.2.1:2888:3888
此更改也必须在其他节点上执行。
这对我有用
Step 1:
Node 1:
zoo.cfg
server.1= 0.0.0.0:<port>:<port2>
server.2= <IP>:<port>:<port2>
.
.
.
server.n= <IP>:<port>:<port2>
Node 2 :
server.1= <IP>:<port>:<port2>
server.2= 0.0.0.0:<port>:<port2>
.
.
.
server.n= <IP>:<port>:<port2>
Now in location defined by datadir on your zoo.cfg
Node 1:
echo 1 > <datadir>/id
Node 2:
echo 2 > <datadir>/id
.
.
.
Node n:
echo n > <datadir>/id
这个帮助我成功地启动了动物园管理员,但是一旦我开始玩它就会知道更多。希望这有帮助。
在 3 节点 zookeeper 整体上有类似的问题。
按照 espeirasbora 的建议解决方案并重新启动。
所以这就是我所做的
zookeeper1、zookeeper2 和 zookeeper3
一个。问题 :: 我的整体中的 znode 无法启动
乙。系统设置 ::
三台机器中的 3 个 Znodes
C。错误::
在我的 zookeper 日志文件中,我可以看到以下错误
2016-06-26 14:10:17,484 [myid:1] - WARN [SyncThread:1:FileTxnLog@334] - fsync-ing the write ahead log in SyncThread:1 took 1340ms which will adversely effect operation latency. See the ZooKeeper troubleshooting guide
2016-06-26 14:10:17,847 [myid:1] - WARN [RecvWorker:2:QuorumCnxManager$RecvWorker@810] - Connection broken for id 2, my id = 1, error =
java.io.EOFException
at java.io.DataInputStream.readInt(DataInputStream.java:392)
at org.apache.zookeeper.server.quorum.QuorumCnxManager$RecvWorker.run(QuorumCnxManager.java:795)
2016-06-26 14:10:17,848 [myid:1] - WARN [RecvWorker:2:QuorumCnxManager$RecvWorker@813] - Interrupting SendWorker
2016-06-26 14:10:17,849 [myid:1] - WARN [SendWorker:2:QuorumCnxManager$SendWorker@727] - Interrupted while waiting for message on queue
java.lang.InterruptedException
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(AbstractQueuedSynchronizer.java:2014)
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2088)
at java.util.concurrent.ArrayBlockingQueue.poll(ArrayBlockingQueue.java:418)
at org.apache.zookeeper.server.quorum.QuorumCnxManager.pollSendQueue(QuorumCnxManager.java:879)
at org.apache.zookeeper.server.quorum.QuorumCnxManager.access0(QuorumCnxManager.java:65)
at org.apache.zookeeper.server.quorum.QuorumCnxManager$SendWorker.run(QuorumCnxManager.java:715)
2016-06-26 14:10:17,851 [myid:1] - WARN [SendWorker:2:QuorumCnxManager$SendWorker@736] - Send worker leaving thread
2016-06-26 14:10:17,852 [myid:1] - WARN [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:Follower@89] - Exception when following the leader
java.io.EOFException
at java.io.DataInputStream.readInt(DataInputStream.java:392)
at org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63)
at org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:83)
at org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:99)
at org.apache.zookeeper.server.quorum.Learner.readPacket(Learner.java:153)
at org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:85)
at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:846)
2016-06-26 14:10:17,854 [myid:1] - INFO [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:Follower@166] - shutdown called
java.lang.Exception: shutdown Follower
D.行动与解决方案::
在每个 znode 上
一种。我修改了配置文件$ZOOKEEPER_HOME/conf/zoo.cfg来设置
机器 IP 为“0.0.0.0”,同时保持其他 2 个 znode 的 IP 地址。
b.重新启动 znode
C。检查状态
d.Voila ,我还好
见下文
-------------------------------------------- ----
在 Zookeeper1 上
#Before modification
[zookeeper1]$ tail -3 $ZOOKEEPER_HOME/conf/zoo.cfg
server.1=zookeeper1:2888:3888
server.2=zookeeper2:2888:3888
server.3=zookeeper3:2888:3888
#After modification
[zookeeper1]$ tail -3 $ZOOKEEPER_HOME/conf/zoo.cfg
server.1=0.0.0.0:2888:3888
server.2=zookeeper2:2888:3888
server.3=zookeeper3:2888:3888
#Start the Zookeper (Stop and STart or restart )
[zookeeper1]$ $ZOOKEEPER_HOME/bin/zkServer.sh start
ZooKeeper JMX enabled by default
ZooKeeper remote JMX Port set to 52128
ZooKeeper remote JMX authenticate set to false
ZooKeeper remote JMX ssl set to false
ZooKeeper remote JMX log4j set to true
Using config: /opt/zookeeper-3.4.8/bin/../conf/zoo.cfg
Mode: follower
[zookeeper1]$ $ZOOKEEPER_HOME/bin/zkServer.sh status
ZooKeeper JMX enabled by default
ZooKeeper remote JMX Port set to 52128
ZooKeeper remote JMX authenticate set to false
ZooKeeper remote JMX ssl set to false
ZooKeeper remote JMX log4j set to true
Using config: /opt/zookeeper-3.4.8/bin/../conf/zoo.cfg
Mode: follower
-------------------------------------------- ------------
在 Zookeeper2 上
#Before modification
[zookeeper2]$ tail -3 $ZOOKEEPER_HOME/conf/zoo.cfg
server.1=zookeeper1:2888:3888
server.2=zookeeper2:2888:3888
server.3=zookeeper3:2888:3888
#After modification
[zookeeper2]$ tail -3 $ZOOKEEPER_HOME/conf/zoo.cfg
server.1=zookeeper1:2888:3888
server.2=0.0.0.0:2888:3888
server.3=zookeeper3:2888:3888
#Start the Zookeper (Stop and STart or restart )
[zookeeper2]$ $ZOOKEEPER_HOME/bin/zkServer.sh start
ZooKeeper JMX enabled by default
ZooKeeper remote JMX Port set to 52128
ZooKeeper remote JMX authenticate set to false
ZooKeeper remote JMX ssl set to false
ZooKeeper remote JMX log4j set to true
Using config: /opt/zookeeper-3.4.8/bin/../conf/zoo.cfg
Mode: follower
[zookeeper2]$ $ZOOKEEPER_HOME/bin/zkServer.sh status
ZooKeeper JMX enabled by default
ZooKeeper remote JMX Port set to 52128
ZooKeeper remote JMX authenticate set to false
ZooKeeper remote JMX ssl set to false
ZooKeeper remote JMX log4j set to true
Using config: /opt/zookeeper-3.4.8/bin/../conf/zoo.cfg
Mode: follower
-------------------------------------------- ------------
在 Zookeeper3 上
#Before modification
[zookeeper3]$ tail -3 $ZOOKEEPER_HOME/conf/zoo.cfg
server.1=zookeeper1:2888:3888
server.2=zookeeper2:2888:3888
server.3=zookeeper3:2888:3888
#After modification
[zookeeper3]$ tail -3 $ZOOKEEPER_HOME/conf/zoo.cfg
server.1=zookeeper1:2888:3888
server.2=zookeeper2:2888:3888
server.3=0.0.0.0:2888:3888
#Start the Zookeper (Stop and STart or restart )
[zookeeper3]$ $ZOOKEEPER_HOME/bin/zkServer.sh start
ZooKeeper JMX enabled by default
ZooKeeper remote JMX Port set to 52128
ZooKeeper remote JMX authenticate set to false
ZooKeeper remote JMX ssl set to false
ZooKeeper remote JMX log4j set to true
Using config: /opt/zookeeper-3.4.8/bin/../conf/zoo.cfg
Mode: follower
[zookeeper3]$ $ZOOKEEPER_HOME/bin/zkServer.sh status
ZooKeeper JMX enabled by default
ZooKeeper remote JMX Port set to 52128
ZooKeeper remote JMX authenticate set to false
ZooKeeper remote JMX ssl set to false
ZooKeeper remote JMX log4j set to true
Using config: /opt/zookeeper-3.4.8/bin/../conf/zoo.cfg
Mode: follower
这里是一些 ansible jinja2 模板信息,用于在 zoo.cfg
中使用 0.0.0.0 主机名自动构建集群
{% for url in zookeeper_hosts_list %}
{%- set url_host = url.split(':')[0] -%}
{%- if url_host == ansible_fqdn or url_host in ansible_all_ipv4_addresses -%}
server.{{loop.index0}}=0.0.0.0:2888:3888
{% else %}
server.{{loop.index0}}={{url_host}}:2888:3888
{% endif %}
{% endfor %}
如果您自己的主机名解析为 127.0.0.1(在我的例子中,主机名在 /etc/hosts 中),如果 zoo.cfg 文件中没有 0.0.0.0,zookeeper 将无法启动,但是如果你的主机名解析为实际机器的 IP,你可以将它自己的主机名放在配置文件中。
遇到保存问题,解决了
确保 myid 与您在 zoo.cfg 中的配置相同。
请检查您 conf
目录中的 zoo.cfg 文件,其中包含此类内容。
server.1=zookeeper1:2888:3888
server.2=zookeeper2:2888:3888
server.3=zookeeper3:2888:3888
并检查服务器 dataDir 目录中的 myid。例如:
假设 zoo.cfg
上定义的 dataDir
是 '/home/admin/data'
然后在 zookeeper1 上,你必须有一个名为 myid 的文件,并且该文件的值为 1;在 zookeeper2 上,你必须有一个名为 myid[ 的文件=32=] 并且在这个文件上的值为 2;在 zookeeper3 上,您必须有一个名为 myid 的文件,并且该文件的值为 3。
如果不这样配置,服务器会监听错误ip:port。
在我的情况下,问题是,我必须启动所有三个 zookeeper 服务器,然后我才能使用 ./zkCli.sh
连接到 zookeeper 服务器
我们遇到了同样的问题,对于我们的案例,问题的根本原因是 too-many 客户端连接数。 aws ec2 实例上的默认 ulimit 是 1024,这会导致 zookeeper 节点无法相互通信。
解决方法是
将 ulimit 更改为更大的数字 -> (> ulimit -n 20000 )
停止并启动动物园管理员。
我遇到了类似的问题。我的三个 zookeeper 节点中有 2 个的状态被列为 "standalone",尽管 zoo.cfg 文件表明它应该是集群的。由于您描述的错误,我的第三个节点无法启动。我认为对我来说修复它的是 运行 zkServer.sh start
在我的三个节点上快速连续,这样在达到 zoo.cfg initLimit 之前 zookeeper 是 运行。希望这对外面的人有用。
我有相同的错误日志,在我的例子中,我在 zookeeper.conf
.
中使用节点的主机名
我的节点在 Centos 8.
的虚拟机上
如@user2286693所说,我的错误是解析机制:
自 node1
,当我 ping node1:
PING node1(localhost (::1)) 56 data bytes
我检查了我的 /etc/hosts
文件,发现:
127.0.0.1 localhost localhost.localdomain localhost4
localhost4.localdomain4 node1
我将这一行替换为:
127.0.0.1 localhost localhost.localdomain localhost4
localhost4.localdomain4
它正在运行!
希望这对某人有所帮助!
在 Amazon 的 VPC 内添加有关 Zookeeper 集群的附加信息。
如果 Zookeeper 运行 直接 在 EC2 实例中,使用 '0.0.0.0' 的解决方案有效,如果您使用 docker '0.0.0.0' 将无法与 Zookeeper 3 一起正常工作。5.X 节点重启后。
问题在于解决“0.0.0.0”和节点地址和 SID 顺序的整体共享(如果您按降序启动节点,则可能不会出现此问题)。
目前唯一可行的解决方案是升级到 3.6.2+ 版本。
当您遇到这个问题时,您会看到类似这样的内容:
org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss
at org.apache.zookeeper.KeeperException.create(KeeperException.java:99) ~[zookeeper-3.4.6.jar:3.4.6-1569965]
这表明 Zookeeper 的网络通信问题是原因。
如何修复
将 zk 缩小到 0。然后再缩小到 3。等待它们全部显示就绪。
现在去zk-0
oc rsh zk-0
和 运行 这个命令:
/opt/fusion/bin/zookeeper-client
Connecting to zk-0.zk:9983,zk-1.zk:9983,zk-2.zk:9983
(--- paused for a moment here ---)
Welcome to ZooKeeper!
JLine support is enabled
[zk: zk-0.zk:9983,zk-1.zk:9983,zk-2.zk:9983(CONNECTING) 0]
请注意它仍然显示“正在连接”。这意味着你没有与 zookeeper 成功通信。
发生这种情况时,您将在 /opt/fusion/var/log/zookeeper/zookeeper.log
中看到:
2021-04-17T00:45:52,848 - WARN [WorkerSender[myid=1]:QuorumCnxManager@584] - Cannot open channel to 2 at election address zk-2.zk:3888
java.net.UnknownHostException: zk-2.zk
at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:184) ~[?:1.8.0_262]
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392) ~[?:1.8.0_262]
at java.net.Socket.connect(Socket.java:607) ~[?:1.8.0_262]
at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:558) [zookeeper-3.4.13.jar:3.4.13-2d71af4dbe22557fda74f9a9b4309b15a7487f03]
at org.apache.zookeeper.server.quorum.QuorumCnxManager.toSend(QuorumCnxManager.java:534) [zookeeper-3.4.13.jar:3.4.13-2d71af4dbe22557fda74f9a9b4309b15a7487f03]
at org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerSender.process(FastLeaderElection.java:454) [zookeeper-3.4.13.jar:3.4.13-2d71af4dbe22557fda74f9a9b4309b15a7487f03]
at org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerSender.run(FastLeaderElection.java:435) [zookeeper-3.4.13.jar:3.4.13-2d71af4dbe22557fda74f9a9b4309b15a7487f03]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_262]
这实际上是我们偶尔在 OpenShift pods 上遇到的臭名昭著的“无路由到主机异常”。发生这种情况时,zookeeper 会显示 Ready 但它无法与其他 zookeeper 通信,所以它实际上在某种意义上没有准备好。
那怎么解决呢?
将 zk statefulset 缩放为 0,然后再次增加到 3。
重复直到连接成功:
/opt/fusion/bin/zookeeper-client
Connecting to zk-0.zk:9983,zk-1.zk:9983,zk-2.zk:9983
Welcome to ZooKeeper!
JLine support is enabled
WATCHER::
WatchedEvent state:SyncConnected type:None path:null
[zk: zk-0.zk:9983,zk-1.zk:9983,zk-2.zk:9983(CONNECTED) 0]
注意 CONNECTED
现在您可以重新启动依赖于 zk 的其余服务。
我也一样,因为仲裁服务器端口 3181 仍被另一个服务使用 - 更改端口修复它
我已经在 3 个不同的 aws 服务器上安装了 zookeeper。以下是所有服务器中的配置
tickTime=2000
initLimit=10
syncLimit=5
dataDir=/var/zookeeper
clientPort=2181
server.1=x.x.x.x:2888:3888
server.2=x.x.x.x:2888:3888
server.3=x.x.x.x:2888:3888
所有三个实例在 var/zookeeper
处都有一个 myid
文件,其中包含适当的 ID。这三台服务器的所有端口都从 aws 控制台打开。但是当我 运行 动物园管理员服务器时,我在所有实例中都收到以下错误。
2015-06-19 12:09:22,989 [myid:1] - WARN [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:QuorumCnxManager@382]
- Cannot open channel to 2 at election address /x.x.x.x:3888
java.net.ConnectException: Connection refused
at java.net.PlainSocketImpl.socketConnect(Native Method)
at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339)
at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:200)
at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
at java.net.Socket.connect(Socket.java:579)
at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:368)
at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectAll(QuorumCnxManager.java:402)
at org.apache.zookeeper.server.quorum.FastLeaderElection.lookForLeader(FastLeaderElection.java:840)
at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:762)
2015-06-19 12:09:23,170 [myid:1] - WARN [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:QuorumCnxManager@382]
- Cannot open channel to 3 at election address /x.x.x.x:3888
java.net.ConnectException: Connection refused
at java.net.PlainSocketImpl.socketConnect(Native Method)
at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339)
at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:200)
at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
at java.net.Socket.connect(Socket.java:579)
at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:368)
at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectAll(QuorumCnxManager.java:402)
at org.apache.zookeeper.server.quorum.FastLeaderElection.lookForLeader(FastLeaderElection.java:840)
at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:762)
2015-06-19 12:09:23,170 [myid:1] - INFO [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@849] - Notification time out: 25600
如何在每个节点定义本地服务器的ip?如果您提供了 public ip,则侦听器将无法连接到该端口。您必须为当前节点指定 0.0.0.0
server.1=0.0.0.0:2888:3888
server.2=192.168.10.10:2888:3888
server.3=192.168.2.1:2888:3888
此更改也必须在其他节点上执行。
这对我有用
Step 1:
Node 1:
zoo.cfg
server.1= 0.0.0.0:<port>:<port2>
server.2= <IP>:<port>:<port2>
.
.
.
server.n= <IP>:<port>:<port2>
Node 2 :
server.1= <IP>:<port>:<port2>
server.2= 0.0.0.0:<port>:<port2>
.
.
.
server.n= <IP>:<port>:<port2>
Now in location defined by datadir on your zoo.cfg
Node 1:
echo 1 > <datadir>/id
Node 2:
echo 2 > <datadir>/id
.
.
.
Node n:
echo n > <datadir>/id
这个帮助我成功地启动了动物园管理员,但是一旦我开始玩它就会知道更多。希望这有帮助。
在 3 节点 zookeeper 整体上有类似的问题。 按照 espeirasbora 的建议解决方案并重新启动。
所以这就是我所做的
zookeeper1、zookeeper2 和 zookeeper3
一个。问题 :: 我的整体中的 znode 无法启动
乙。系统设置 :: 三台机器中的 3 个 Znodes
C。错误::
在我的 zookeper 日志文件中,我可以看到以下错误
2016-06-26 14:10:17,484 [myid:1] - WARN [SyncThread:1:FileTxnLog@334] - fsync-ing the write ahead log in SyncThread:1 took 1340ms which will adversely effect operation latency. See the ZooKeeper troubleshooting guide
2016-06-26 14:10:17,847 [myid:1] - WARN [RecvWorker:2:QuorumCnxManager$RecvWorker@810] - Connection broken for id 2, my id = 1, error =
java.io.EOFException
at java.io.DataInputStream.readInt(DataInputStream.java:392)
at org.apache.zookeeper.server.quorum.QuorumCnxManager$RecvWorker.run(QuorumCnxManager.java:795)
2016-06-26 14:10:17,848 [myid:1] - WARN [RecvWorker:2:QuorumCnxManager$RecvWorker@813] - Interrupting SendWorker
2016-06-26 14:10:17,849 [myid:1] - WARN [SendWorker:2:QuorumCnxManager$SendWorker@727] - Interrupted while waiting for message on queue
java.lang.InterruptedException
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(AbstractQueuedSynchronizer.java:2014)
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2088)
at java.util.concurrent.ArrayBlockingQueue.poll(ArrayBlockingQueue.java:418)
at org.apache.zookeeper.server.quorum.QuorumCnxManager.pollSendQueue(QuorumCnxManager.java:879)
at org.apache.zookeeper.server.quorum.QuorumCnxManager.access0(QuorumCnxManager.java:65)
at org.apache.zookeeper.server.quorum.QuorumCnxManager$SendWorker.run(QuorumCnxManager.java:715)
2016-06-26 14:10:17,851 [myid:1] - WARN [SendWorker:2:QuorumCnxManager$SendWorker@736] - Send worker leaving thread
2016-06-26 14:10:17,852 [myid:1] - WARN [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:Follower@89] - Exception when following the leader
java.io.EOFException
at java.io.DataInputStream.readInt(DataInputStream.java:392)
at org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63)
at org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:83)
at org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:99)
at org.apache.zookeeper.server.quorum.Learner.readPacket(Learner.java:153)
at org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:85)
at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:846)
2016-06-26 14:10:17,854 [myid:1] - INFO [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:Follower@166] - shutdown called
java.lang.Exception: shutdown Follower
D.行动与解决方案::
在每个 znode 上 一种。我修改了配置文件$ZOOKEEPER_HOME/conf/zoo.cfg来设置 机器 IP 为“0.0.0.0”,同时保持其他 2 个 znode 的 IP 地址。 b.重新启动 znode C。检查状态 d.Voila ,我还好
见下文
-------------------------------------------- ----
在 Zookeeper1 上
#Before modification
[zookeeper1]$ tail -3 $ZOOKEEPER_HOME/conf/zoo.cfg
server.1=zookeeper1:2888:3888
server.2=zookeeper2:2888:3888
server.3=zookeeper3:2888:3888
#After modification
[zookeeper1]$ tail -3 $ZOOKEEPER_HOME/conf/zoo.cfg
server.1=0.0.0.0:2888:3888
server.2=zookeeper2:2888:3888
server.3=zookeeper3:2888:3888
#Start the Zookeper (Stop and STart or restart )
[zookeeper1]$ $ZOOKEEPER_HOME/bin/zkServer.sh start
ZooKeeper JMX enabled by default
ZooKeeper remote JMX Port set to 52128
ZooKeeper remote JMX authenticate set to false
ZooKeeper remote JMX ssl set to false
ZooKeeper remote JMX log4j set to true
Using config: /opt/zookeeper-3.4.8/bin/../conf/zoo.cfg
Mode: follower
[zookeeper1]$ $ZOOKEEPER_HOME/bin/zkServer.sh status
ZooKeeper JMX enabled by default
ZooKeeper remote JMX Port set to 52128
ZooKeeper remote JMX authenticate set to false
ZooKeeper remote JMX ssl set to false
ZooKeeper remote JMX log4j set to true
Using config: /opt/zookeeper-3.4.8/bin/../conf/zoo.cfg
Mode: follower
-------------------------------------------- ------------
在 Zookeeper2 上
#Before modification
[zookeeper2]$ tail -3 $ZOOKEEPER_HOME/conf/zoo.cfg
server.1=zookeeper1:2888:3888
server.2=zookeeper2:2888:3888
server.3=zookeeper3:2888:3888
#After modification
[zookeeper2]$ tail -3 $ZOOKEEPER_HOME/conf/zoo.cfg
server.1=zookeeper1:2888:3888
server.2=0.0.0.0:2888:3888
server.3=zookeeper3:2888:3888
#Start the Zookeper (Stop and STart or restart )
[zookeeper2]$ $ZOOKEEPER_HOME/bin/zkServer.sh start
ZooKeeper JMX enabled by default
ZooKeeper remote JMX Port set to 52128
ZooKeeper remote JMX authenticate set to false
ZooKeeper remote JMX ssl set to false
ZooKeeper remote JMX log4j set to true
Using config: /opt/zookeeper-3.4.8/bin/../conf/zoo.cfg
Mode: follower
[zookeeper2]$ $ZOOKEEPER_HOME/bin/zkServer.sh status
ZooKeeper JMX enabled by default
ZooKeeper remote JMX Port set to 52128
ZooKeeper remote JMX authenticate set to false
ZooKeeper remote JMX ssl set to false
ZooKeeper remote JMX log4j set to true
Using config: /opt/zookeeper-3.4.8/bin/../conf/zoo.cfg
Mode: follower
-------------------------------------------- ------------
在 Zookeeper3 上
#Before modification
[zookeeper3]$ tail -3 $ZOOKEEPER_HOME/conf/zoo.cfg
server.1=zookeeper1:2888:3888
server.2=zookeeper2:2888:3888
server.3=zookeeper3:2888:3888
#After modification
[zookeeper3]$ tail -3 $ZOOKEEPER_HOME/conf/zoo.cfg
server.1=zookeeper1:2888:3888
server.2=zookeeper2:2888:3888
server.3=0.0.0.0:2888:3888
#Start the Zookeper (Stop and STart or restart )
[zookeeper3]$ $ZOOKEEPER_HOME/bin/zkServer.sh start
ZooKeeper JMX enabled by default
ZooKeeper remote JMX Port set to 52128
ZooKeeper remote JMX authenticate set to false
ZooKeeper remote JMX ssl set to false
ZooKeeper remote JMX log4j set to true
Using config: /opt/zookeeper-3.4.8/bin/../conf/zoo.cfg
Mode: follower
[zookeeper3]$ $ZOOKEEPER_HOME/bin/zkServer.sh status
ZooKeeper JMX enabled by default
ZooKeeper remote JMX Port set to 52128
ZooKeeper remote JMX authenticate set to false
ZooKeeper remote JMX ssl set to false
ZooKeeper remote JMX log4j set to true
Using config: /opt/zookeeper-3.4.8/bin/../conf/zoo.cfg
Mode: follower
这里是一些 ansible jinja2 模板信息,用于在 zoo.cfg
中使用 0.0.0.0 主机名自动构建集群{% for url in zookeeper_hosts_list %}
{%- set url_host = url.split(':')[0] -%}
{%- if url_host == ansible_fqdn or url_host in ansible_all_ipv4_addresses -%}
server.{{loop.index0}}=0.0.0.0:2888:3888
{% else %}
server.{{loop.index0}}={{url_host}}:2888:3888
{% endif %}
{% endfor %}
如果您自己的主机名解析为 127.0.0.1(在我的例子中,主机名在 /etc/hosts 中),如果 zoo.cfg 文件中没有 0.0.0.0,zookeeper 将无法启动,但是如果你的主机名解析为实际机器的 IP,你可以将它自己的主机名放在配置文件中。
遇到保存问题,解决了
确保 myid 与您在 zoo.cfg 中的配置相同。
请检查您 conf
目录中的 zoo.cfg 文件,其中包含此类内容。
server.1=zookeeper1:2888:3888
server.2=zookeeper2:2888:3888
server.3=zookeeper3:2888:3888
并检查服务器 dataDir 目录中的 myid。例如:
假设 zoo.cfg
上定义的 dataDir
是 '/home/admin/data'
然后在 zookeeper1 上,你必须有一个名为 myid 的文件,并且该文件的值为 1;在 zookeeper2 上,你必须有一个名为 myid[ 的文件=32=] 并且在这个文件上的值为 2;在 zookeeper3 上,您必须有一个名为 myid 的文件,并且该文件的值为 3。
如果不这样配置,服务器会监听错误ip:port。
在我的情况下,问题是,我必须启动所有三个 zookeeper 服务器,然后我才能使用 ./zkCli.sh
我们遇到了同样的问题,对于我们的案例,问题的根本原因是 too-many 客户端连接数。 aws ec2 实例上的默认 ulimit 是 1024,这会导致 zookeeper 节点无法相互通信。
解决方法是 将 ulimit 更改为更大的数字 -> (> ulimit -n 20000 ) 停止并启动动物园管理员。
我遇到了类似的问题。我的三个 zookeeper 节点中有 2 个的状态被列为 "standalone",尽管 zoo.cfg 文件表明它应该是集群的。由于您描述的错误,我的第三个节点无法启动。我认为对我来说修复它的是 运行 zkServer.sh start
在我的三个节点上快速连续,这样在达到 zoo.cfg initLimit 之前 zookeeper 是 运行。希望这对外面的人有用。
我有相同的错误日志,在我的例子中,我在 zookeeper.conf
.
我的节点在 Centos 8.
的虚拟机上如@user2286693所说,我的错误是解析机制:
自 node1
,当我 ping node1:
PING node1(localhost (::1)) 56 data bytes
我检查了我的 /etc/hosts
文件,发现:
127.0.0.1 localhost localhost.localdomain localhost4
localhost4.localdomain4 node1
我将这一行替换为:
127.0.0.1 localhost localhost.localdomain localhost4
localhost4.localdomain4
它正在运行!
希望这对某人有所帮助!
在 Amazon 的 VPC 内添加有关 Zookeeper 集群的附加信息。 如果 Zookeeper 运行 直接 在 EC2 实例中,使用 '0.0.0.0' 的解决方案有效,如果您使用 docker '0.0.0.0' 将无法与 Zookeeper 3 一起正常工作。5.X 节点重启后。
问题在于解决“0.0.0.0”和节点地址和 SID 顺序的整体共享(如果您按降序启动节点,则可能不会出现此问题)。
目前唯一可行的解决方案是升级到 3.6.2+ 版本。
当您遇到这个问题时,您会看到类似这样的内容:
org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss
at org.apache.zookeeper.KeeperException.create(KeeperException.java:99) ~[zookeeper-3.4.6.jar:3.4.6-1569965]
这表明 Zookeeper 的网络通信问题是原因。
如何修复
将 zk 缩小到 0。然后再缩小到 3。等待它们全部显示就绪。
现在去zk-0
oc rsh zk-0
和 运行 这个命令:
/opt/fusion/bin/zookeeper-client
Connecting to zk-0.zk:9983,zk-1.zk:9983,zk-2.zk:9983
(--- paused for a moment here ---)
Welcome to ZooKeeper!
JLine support is enabled
[zk: zk-0.zk:9983,zk-1.zk:9983,zk-2.zk:9983(CONNECTING) 0]
请注意它仍然显示“正在连接”。这意味着你没有与 zookeeper 成功通信。
发生这种情况时,您将在 /opt/fusion/var/log/zookeeper/zookeeper.log
中看到:
2021-04-17T00:45:52,848 - WARN [WorkerSender[myid=1]:QuorumCnxManager@584] - Cannot open channel to 2 at election address zk-2.zk:3888
java.net.UnknownHostException: zk-2.zk
at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:184) ~[?:1.8.0_262]
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392) ~[?:1.8.0_262]
at java.net.Socket.connect(Socket.java:607) ~[?:1.8.0_262]
at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:558) [zookeeper-3.4.13.jar:3.4.13-2d71af4dbe22557fda74f9a9b4309b15a7487f03]
at org.apache.zookeeper.server.quorum.QuorumCnxManager.toSend(QuorumCnxManager.java:534) [zookeeper-3.4.13.jar:3.4.13-2d71af4dbe22557fda74f9a9b4309b15a7487f03]
at org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerSender.process(FastLeaderElection.java:454) [zookeeper-3.4.13.jar:3.4.13-2d71af4dbe22557fda74f9a9b4309b15a7487f03]
at org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerSender.run(FastLeaderElection.java:435) [zookeeper-3.4.13.jar:3.4.13-2d71af4dbe22557fda74f9a9b4309b15a7487f03]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_262]
这实际上是我们偶尔在 OpenShift pods 上遇到的臭名昭著的“无路由到主机异常”。发生这种情况时,zookeeper 会显示 Ready 但它无法与其他 zookeeper 通信,所以它实际上在某种意义上没有准备好。
那怎么解决呢?
将 zk statefulset 缩放为 0,然后再次增加到 3。
重复直到连接成功:
/opt/fusion/bin/zookeeper-client
Connecting to zk-0.zk:9983,zk-1.zk:9983,zk-2.zk:9983
Welcome to ZooKeeper!
JLine support is enabled
WATCHER::
WatchedEvent state:SyncConnected type:None path:null
[zk: zk-0.zk:9983,zk-1.zk:9983,zk-2.zk:9983(CONNECTED) 0]
注意 CONNECTED
现在您可以重新启动依赖于 zk 的其余服务。
我也一样,因为仲裁服务器端口 3181 仍被另一个服务使用 - 更改端口修复它