pt-table-校验和未检测差异

pt-table-checksum not detecting diffs

我有一个简单的主->从 MariaDB 设置:

大师:Ubuntu 16.04 LTS 与 MariaDB 10.2.8 和 percona-toolkit 3.0.4

从站:Ubuntu 16.04 LTS with MariaDB 10.2.7

复制 运行 正常,现在我想检查主从之间的数据是否相同。

我在 master 上安装了 percona-toolkit 并创建了一个校验和用户:

MariaDB> GRANT REPLICATION SLAVE,PROCESS,SUPER, SELECT ON *.* TO `pt_checksum`@'%' IDENTIFIED BY 'password';
MariaDB> GRANT ALL PRIVILEGES ON percona.* TO `pt_checksum`@'%';
MariaDB> FLUSH PRIVILEGES;

我还在slave conf中添加了report_host,这样它就可以呈现给master:

MariaDB [(none)]> show slave hosts;
+-----------+-----------+------+-----------+
| Server_id | Host      | Port | Master_id |
+-----------+-----------+------+-----------+
|         2 | 10.0.0.49 | 3306 |         1 |
+-----------+-----------+------+-----------+
1 row in set (0.00 sec)

为了测试 pt-table-checksum,我从我的测试数据库中的 Tickets table 中删除了一行。我已经确认这一行确实丢失了,但仍然存在于 master 上。

但是pt-table-checksum并没有报告这个区别:

# pt-table-checksum --databases=shop_test --tables=Tickets --host=localhost --user=pt_checksum --password=... --no-check-binlog-format --no-check-replication-filters
        TS ERRORS  DIFFS     ROWS  CHUNKS SKIPPED    TIME TABLE
09-07T16:15:02      0      0       14       1       0   0.013 shop_test.Tickets

所以我在我的环境中设置了 PTDEBUG=1,但似乎主服务器与从服务器连接良好。我试图从输出中挑选出相关位:

# MasterSlave:5175 9725 Connected to h=localhost,p=...,u=pt_checksum
# MasterSlave:5184 9725 SELECT @@SERVER_ID
# MasterSlave:5186 9725 Working on server ID 1
# MasterSlave:5219 9725 Looking for slaves on h=localhost,p=...,u=pt_checksum using methods processlist hosts
# MasterSlave:5226 9725 Finding slaves with _find_slaves_by_processlist
# MasterSlave:5288 9725 DBI::db=HASH(0x31c5190) SHOW GRANTS FOR CURRENT_USER()
# MasterSlave:5318 9725 DBI::db=HASH(0x31c5190) SHOW FULL PROCESSLIST
# DSNParser:1417 9725 Parsing h=10.0.0.49
[...]
# MasterSlave:5231 9725 Found 1 slaves
# MasterSlave:5208 9725 Recursing from h=localhost,p=...,u=pt_checksum to h=10.0.0.49,p=...,u=pt_checksum
# MasterSlave:5155 9725 Recursion methods: processlist hosts
[...]
# MasterSlave:5175 9725 Connected to h=10.0.0.49,p=...,u=pt_checksum
# MasterSlave:5184 9725 SELECT @@SERVER_ID
# MasterSlave:5186 9725 Working on server ID 2
# MasterSlave:5097 9725 Found slave: h=10.0.0.49,p=...,u=pt_checksum
[...]
# pt_table_checksum:9793 9725 Exit status 0 oktorun 1
# Cxn:3764 9725 Destroying cxn
# Cxn:3774 9725 DBI::db=HASH(0x31cd218) Disconnecting dbh on slaveserver h=10.0.0.49
# Cxn:3764 9725 Destroying cxn
# Cxn:3774 9725 DBI::db=HASH(0x31c5190) Disconnecting dbh on masterserver h=localhost

我不知道为什么没有检测到丢失的行?

我在周末注意到一个新的错误报告,今天我确认这确实是我遇到的问题。

解决方法是添加 --set-vars binlog_format=statement.

当我设置这个选项时,差异在第二个 运行 之后显现出来。

在第一个 运行 期间,从站上的校验和 table 更改为:

MariaDB [percona]> select tbl, this_crc, this_cnt, master_crc,master_cnt from checksums where tbl = 'Tickets' and db = 'shop_test';
+---------+----------+----------+------------+------------+
| tbl     | this_crc | this_cnt | master_crc | master_cnt |
+---------+----------+----------+------------+------------+
| Tickets | f30abebe |       14 | f30abebe   |         14 |
+---------+----------+----------+------------+------------+

...到...

MariaDB [percona]> select tbl, this_crc, this_cnt, master_crc,master_cnt from checksums where tbl = 'Tickets' and db = 'shop_test';
+---------+----------+----------+------------+------------+
| tbl     | this_crc | this_cnt | master_crc | master_cnt |
+---------+----------+----------+------------+------------+
| Tickets | 284ec207 |       13 | f30abebe   |         14 |
+---------+----------+----------+------------+------------+

在第二个 运行 之后,差异也存在于 pt-checksum-table 输出中:

# pt-table-checksum --tables=shop_test.Tickets --host=localhost --user=pt_checksum --password=... --no-check-binlog-format --no-check-replication-filters --set-vars binlog_format=statement
            TS ERRORS  DIFFS     ROWS  CHUNKS SKIPPED    TIME TABLE
09-11T11:17:37      0      1       14       1       0   0.022 shop_test.Tickets

我与 SHOW VARIABLES LIKE 'binlog_format' 确认 binlog_format 仍然是 'MIXED',因此显然它只在会话期间发生变化。根据文档,据我所知,这应该会自动发生:

This works only with statement-based replication (pt-table-checksum will switch the binlog format to STATEMENT for the duration of the session if your server uses row-based replication).

错误报告: https://jira.percona.com/browse/PT-1443

[更新] 问题在 2020 年 9 月仍未解决