pt-table-校验和未检测差异
pt-table-checksum not detecting diffs
我有一个简单的主->从 MariaDB 设置:
大师:Ubuntu 16.04 LTS 与 MariaDB 10.2.8 和 percona-toolkit 3.0.4
从站:Ubuntu 16.04 LTS with MariaDB 10.2.7
复制 运行 正常,现在我想检查主从之间的数据是否相同。
我在 master 上安装了 percona-toolkit 并创建了一个校验和用户:
MariaDB> GRANT REPLICATION SLAVE,PROCESS,SUPER, SELECT ON *.* TO `pt_checksum`@'%' IDENTIFIED BY 'password';
MariaDB> GRANT ALL PRIVILEGES ON percona.* TO `pt_checksum`@'%';
MariaDB> FLUSH PRIVILEGES;
我还在slave conf中添加了report_host,这样它就可以呈现给master:
MariaDB [(none)]> show slave hosts;
+-----------+-----------+------+-----------+
| Server_id | Host | Port | Master_id |
+-----------+-----------+------+-----------+
| 2 | 10.0.0.49 | 3306 | 1 |
+-----------+-----------+------+-----------+
1 row in set (0.00 sec)
为了测试 pt-table-checksum,我从我的测试数据库中的 Tickets
table 中删除了一行。我已经确认这一行确实丢失了,但仍然存在于 master 上。
但是pt-table-checksum并没有报告这个区别:
# pt-table-checksum --databases=shop_test --tables=Tickets --host=localhost --user=pt_checksum --password=... --no-check-binlog-format --no-check-replication-filters
TS ERRORS DIFFS ROWS CHUNKS SKIPPED TIME TABLE
09-07T16:15:02 0 0 14 1 0 0.013 shop_test.Tickets
所以我在我的环境中设置了 PTDEBUG=1,但似乎主服务器与从服务器连接良好。我试图从输出中挑选出相关位:
# MasterSlave:5175 9725 Connected to h=localhost,p=...,u=pt_checksum
# MasterSlave:5184 9725 SELECT @@SERVER_ID
# MasterSlave:5186 9725 Working on server ID 1
# MasterSlave:5219 9725 Looking for slaves on h=localhost,p=...,u=pt_checksum using methods processlist hosts
# MasterSlave:5226 9725 Finding slaves with _find_slaves_by_processlist
# MasterSlave:5288 9725 DBI::db=HASH(0x31c5190) SHOW GRANTS FOR CURRENT_USER()
# MasterSlave:5318 9725 DBI::db=HASH(0x31c5190) SHOW FULL PROCESSLIST
# DSNParser:1417 9725 Parsing h=10.0.0.49
[...]
# MasterSlave:5231 9725 Found 1 slaves
# MasterSlave:5208 9725 Recursing from h=localhost,p=...,u=pt_checksum to h=10.0.0.49,p=...,u=pt_checksum
# MasterSlave:5155 9725 Recursion methods: processlist hosts
[...]
# MasterSlave:5175 9725 Connected to h=10.0.0.49,p=...,u=pt_checksum
# MasterSlave:5184 9725 SELECT @@SERVER_ID
# MasterSlave:5186 9725 Working on server ID 2
# MasterSlave:5097 9725 Found slave: h=10.0.0.49,p=...,u=pt_checksum
[...]
# pt_table_checksum:9793 9725 Exit status 0 oktorun 1
# Cxn:3764 9725 Destroying cxn
# Cxn:3774 9725 DBI::db=HASH(0x31cd218) Disconnecting dbh on slaveserver h=10.0.0.49
# Cxn:3764 9725 Destroying cxn
# Cxn:3774 9725 DBI::db=HASH(0x31c5190) Disconnecting dbh on masterserver h=localhost
我不知道为什么没有检测到丢失的行?
我在周末注意到一个新的错误报告,今天我确认这确实是我遇到的问题。
解决方法是添加 --set-vars binlog_format=statement
.
当我设置这个选项时,差异在第二个 运行 之后显现出来。
在第一个 运行 期间,从站上的校验和 table 更改为:
MariaDB [percona]> select tbl, this_crc, this_cnt, master_crc,master_cnt from checksums where tbl = 'Tickets' and db = 'shop_test';
+---------+----------+----------+------------+------------+
| tbl | this_crc | this_cnt | master_crc | master_cnt |
+---------+----------+----------+------------+------------+
| Tickets | f30abebe | 14 | f30abebe | 14 |
+---------+----------+----------+------------+------------+
...到...
MariaDB [percona]> select tbl, this_crc, this_cnt, master_crc,master_cnt from checksums where tbl = 'Tickets' and db = 'shop_test';
+---------+----------+----------+------------+------------+
| tbl | this_crc | this_cnt | master_crc | master_cnt |
+---------+----------+----------+------------+------------+
| Tickets | 284ec207 | 13 | f30abebe | 14 |
+---------+----------+----------+------------+------------+
在第二个 运行 之后,差异也存在于 pt-checksum-table 输出中:
# pt-table-checksum --tables=shop_test.Tickets --host=localhost --user=pt_checksum --password=... --no-check-binlog-format --no-check-replication-filters --set-vars binlog_format=statement
TS ERRORS DIFFS ROWS CHUNKS SKIPPED TIME TABLE
09-11T11:17:37 0 1 14 1 0 0.022 shop_test.Tickets
我与 SHOW VARIABLES LIKE 'binlog_format'
确认 binlog_format 仍然是 'MIXED',因此显然它只在会话期间发生变化。根据文档,据我所知,这应该会自动发生:
This works only with statement-based replication (pt-table-checksum
will switch the binlog format to STATEMENT for the duration of the
session if your server uses row-based replication).
错误报告:
https://jira.percona.com/browse/PT-1443
[更新] 问题在 2020 年 9 月仍未解决
我有一个简单的主->从 MariaDB 设置:
大师:Ubuntu 16.04 LTS 与 MariaDB 10.2.8 和 percona-toolkit 3.0.4
从站:Ubuntu 16.04 LTS with MariaDB 10.2.7
复制 运行 正常,现在我想检查主从之间的数据是否相同。
我在 master 上安装了 percona-toolkit 并创建了一个校验和用户:
MariaDB> GRANT REPLICATION SLAVE,PROCESS,SUPER, SELECT ON *.* TO `pt_checksum`@'%' IDENTIFIED BY 'password';
MariaDB> GRANT ALL PRIVILEGES ON percona.* TO `pt_checksum`@'%';
MariaDB> FLUSH PRIVILEGES;
我还在slave conf中添加了report_host,这样它就可以呈现给master:
MariaDB [(none)]> show slave hosts;
+-----------+-----------+------+-----------+
| Server_id | Host | Port | Master_id |
+-----------+-----------+------+-----------+
| 2 | 10.0.0.49 | 3306 | 1 |
+-----------+-----------+------+-----------+
1 row in set (0.00 sec)
为了测试 pt-table-checksum,我从我的测试数据库中的 Tickets
table 中删除了一行。我已经确认这一行确实丢失了,但仍然存在于 master 上。
但是pt-table-checksum并没有报告这个区别:
# pt-table-checksum --databases=shop_test --tables=Tickets --host=localhost --user=pt_checksum --password=... --no-check-binlog-format --no-check-replication-filters
TS ERRORS DIFFS ROWS CHUNKS SKIPPED TIME TABLE
09-07T16:15:02 0 0 14 1 0 0.013 shop_test.Tickets
所以我在我的环境中设置了 PTDEBUG=1,但似乎主服务器与从服务器连接良好。我试图从输出中挑选出相关位:
# MasterSlave:5175 9725 Connected to h=localhost,p=...,u=pt_checksum
# MasterSlave:5184 9725 SELECT @@SERVER_ID
# MasterSlave:5186 9725 Working on server ID 1
# MasterSlave:5219 9725 Looking for slaves on h=localhost,p=...,u=pt_checksum using methods processlist hosts
# MasterSlave:5226 9725 Finding slaves with _find_slaves_by_processlist
# MasterSlave:5288 9725 DBI::db=HASH(0x31c5190) SHOW GRANTS FOR CURRENT_USER()
# MasterSlave:5318 9725 DBI::db=HASH(0x31c5190) SHOW FULL PROCESSLIST
# DSNParser:1417 9725 Parsing h=10.0.0.49
[...]
# MasterSlave:5231 9725 Found 1 slaves
# MasterSlave:5208 9725 Recursing from h=localhost,p=...,u=pt_checksum to h=10.0.0.49,p=...,u=pt_checksum
# MasterSlave:5155 9725 Recursion methods: processlist hosts
[...]
# MasterSlave:5175 9725 Connected to h=10.0.0.49,p=...,u=pt_checksum
# MasterSlave:5184 9725 SELECT @@SERVER_ID
# MasterSlave:5186 9725 Working on server ID 2
# MasterSlave:5097 9725 Found slave: h=10.0.0.49,p=...,u=pt_checksum
[...]
# pt_table_checksum:9793 9725 Exit status 0 oktorun 1
# Cxn:3764 9725 Destroying cxn
# Cxn:3774 9725 DBI::db=HASH(0x31cd218) Disconnecting dbh on slaveserver h=10.0.0.49
# Cxn:3764 9725 Destroying cxn
# Cxn:3774 9725 DBI::db=HASH(0x31c5190) Disconnecting dbh on masterserver h=localhost
我不知道为什么没有检测到丢失的行?
我在周末注意到一个新的错误报告,今天我确认这确实是我遇到的问题。
解决方法是添加 --set-vars binlog_format=statement
.
当我设置这个选项时,差异在第二个 运行 之后显现出来。
在第一个 运行 期间,从站上的校验和 table 更改为:
MariaDB [percona]> select tbl, this_crc, this_cnt, master_crc,master_cnt from checksums where tbl = 'Tickets' and db = 'shop_test';
+---------+----------+----------+------------+------------+
| tbl | this_crc | this_cnt | master_crc | master_cnt |
+---------+----------+----------+------------+------------+
| Tickets | f30abebe | 14 | f30abebe | 14 |
+---------+----------+----------+------------+------------+
...到...
MariaDB [percona]> select tbl, this_crc, this_cnt, master_crc,master_cnt from checksums where tbl = 'Tickets' and db = 'shop_test';
+---------+----------+----------+------------+------------+
| tbl | this_crc | this_cnt | master_crc | master_cnt |
+---------+----------+----------+------------+------------+
| Tickets | 284ec207 | 13 | f30abebe | 14 |
+---------+----------+----------+------------+------------+
在第二个 运行 之后,差异也存在于 pt-checksum-table 输出中:
# pt-table-checksum --tables=shop_test.Tickets --host=localhost --user=pt_checksum --password=... --no-check-binlog-format --no-check-replication-filters --set-vars binlog_format=statement
TS ERRORS DIFFS ROWS CHUNKS SKIPPED TIME TABLE
09-11T11:17:37 0 1 14 1 0 0.022 shop_test.Tickets
我与 SHOW VARIABLES LIKE 'binlog_format'
确认 binlog_format 仍然是 'MIXED',因此显然它只在会话期间发生变化。根据文档,据我所知,这应该会自动发生:
This works only with statement-based replication (pt-table-checksum will switch the binlog format to STATEMENT for the duration of the session if your server uses row-based replication).
错误报告: https://jira.percona.com/browse/PT-1443
[更新] 问题在 2020 年 9 月仍未解决