Zookeeper - 当前纪元错误,早于上一个 zxid

Zookeeper - Error the current epoch, is older than the last zxid

我正在使用 3 个节点的 zookeeper 集成 运行 3.4.13。有时在机器重新启动后 zookeeper 没有在其中一个节点中启动,我在日志中看到以下错误

2019-08-19 04:18:36,906 [myid:2] - ERROR [main:QuorumPeer@692] - Unable to load database on disk
java.io.IOException: The current epoch, 7, is older than the last zxid, 34359738370
at org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:674)
at org.apache.zookeeper.server.quorum.QuorumPeer.start(QuorumPeer.java:635)
at org.apache.zookeeper.server.quorum.QuorumPeerMain.runFromConfig(QuorumPeerMain.java:170)
at org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:114)
at org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:81)
2019-08-19 04:18:36,908 [myid:2] - ERROR [main:QuorumPeerMain@92] - Unexpected exception, exiting abnormally
java.lang.RuntimeException: Unable to run quorum server
at org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:693)
at org.apache.zookeeper.server.quorum.QuorumPeer.start(QuorumPeer.java:635)
at org.apache.zookeeper.server.quorum.QuorumPeerMain.runFromConfig(QuorumPeerMain.java:170)
at org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:114)
at org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:81)
Caused by: java.io.IOException: The current epoch, 7, is older than the last zxid, 34359738370
at org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:674)
... 4 more----

我见过 ZOOKEEPER-2354 症状看起来很相似。

support@platform2:/var/lib/zookeeper/version-2$ sudo cat acceptedEpoch
8support@platform2:/var/lib/zookeeper/version-2$ sudo cat currentEpoch
7support@platform2:/var/lib/zookeeper/version-2$ sudo cat currentEpoch.tmp
8support@platform2

上述问题表明该问题已在 3.4.6 中修复,但我在 3.4.13 中观察到了同样的情况。

谁能告诉我如何从中恢复 zookeeper 节点?

这已在 zookeeper mailing thread 中讨论过。该线程的相关引用

With the other two zookeeper servers running I stopped the zookeeper in the broken node and the deleted all the contents inside /var/lib/zookeeper/version-2 and started the zookeeper back on the node. It is running fine now and got all the data from the other servers.