启动 Percona XTradb 集群的死节点

Starting a dead node of Percona XTradb cluster

我们有一个包含三个节点的 Xtradb 集群。有一个节点,没有正常停止,不会启动。其他两个节点正常工作和响应。日志中唯一的内容是:

-- Unit mysql.service has begun starting up.
Aug 25 04:40:45 percona-prod-perconaxtradb-vm-0 /etc/init.d/mysql[2503]: MySQL PID not found, pid_file detected/guessed: /var/run/mysql
Aug 25 04:40:52 percona-prod-perconaxtradb-vm-0 mysql[2462]: Starting MySQL (Percona XtraDB Cluster) database server: mysqld . . . . .
Aug 25 04:40:52 percona-prod-perconaxtradb-vm-0 mysql[2462]: failed!
Aug 25 04:40:52 percona-prod-perconaxtradb-vm-0 systemd[1]: mysql.service: control process exited, code=exited status=1
Aug 25 04:40:52 percona-prod-perconaxtradb-vm-0 systemd[1]: Failed to start LSB: Start and stop the mysql (Percona XtraDB Cluster) daem
-- Subject: Unit mysql.service has failed
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel

/var/lib/mysql/wsrep_recovery.qEEkjd 中我们发现了这个:

2018-08-25T05:49:31.055887Z 0 [ERROR] Found 20 prepared transactions! It means that mysqld was not shut down properly last time and critical recovery information (last binlog or tc.log file) was manually deleted after a crash. You have to start mysqld with --tc-heuristic-recover switch to commit or rollback pending transactions.
2018-08-25T05:49:31.055892Z 0 [ERROR] Aborting

2018-08-25T05:49:31.055901Z 0 [Note] Binlog end

我们想完全放弃这些 20 prepared transactions

其他两个节点是一致的并且可以工作,所以告诉这个节点就足够了"ignore your state and sync with other nodes"。

最后我们删除了dead节点上的/data文件夹,并重启了节点。然后节点开始 SST 复制——这需要很长时间,唯一可以看到的进展是检查文件夹的大小。但后来它奏效了。