Debezium MySQL (MariaDB) - 错误消息计数

Debezium MySQL (MariaDB) - wrong message counts

我正在使用 debezium 导出数据库,之前我测试了这个设置并且它工作正常(大约 1% 的生产数据),但是在生产设置中我发现数据库中的行数与debezium 导出的消息数。

例如我有一个 table db.large,它有大约 2.59 亿个条目,但 debezium 只导出了 2 亿个。对于其他一些 tables,我收到的 debezium 导出的消息多于 table 中实际存在的消息(这只是在初始快照期间)。对于只有 542 个条目的小型 table,计数匹配。

我在日志中看到一些 Failed to flushFailed to commit offsets 消息,但并非所有偏移量刷新都会出现这些消息 - 有些是成功的。这些 flush/commit 失败是否是不匹配的原因?

我在 debezium 1.7 中使用 MySQL 连接器。

以下是证明不匹配的部分日志:

INFO   ||  WorkerSourceTask{id=connector-v1-0} flushing 5722 outstanding messages for offset commit
ERROR  ||  WorkerSourceTask{id=connector-v1-0} Failed to flush, timed out while waiting for producer to flush outstanding 211 messages
ERROR  ||  WorkerSourceTask{id=connector-v1-0} Failed to commit offsets
INFO   MySQL|connector_v1|snapshot       Exported 201944873 of 259000000 records for table 'db.large' after 10:09:38.853
INFO   MySQL|connector_v1|snapshot       Exported 202002217 of 259000000 records for table 'db.large' after 10:09:49.062
INFO   MySQL|connector_v1|snapshot       Exported 202057513 of 259000000 records for table 'db.large' after 10:09:59.281
INFO   MySQL|connector_v1|snapshot       Exported 202112809 of 259000000 records for table 'db.large' after 10:10:09.488
INFO   MySQL|connector_v1|snapshot       Exported 202168105 of 259000000 records for table 'db.large' after 10:10:19.669
INFO   MySQL|connector_v1|snapshot       Exported 202221353 of 259000000 records for table 'db.large' after 10:10:30.152
INFO   ||  WorkerSourceTask{id=connector-v1-0} flushing 5788 outstanding messages for offset commit
INFO   MySQL|connector_v1|snapshot       Exported 202278697 of 259000000 records for table 'db.large' after 10:10:40.334
ERROR  ||  WorkerSourceTask{id=connector-v1-0} Failed to flush, timed out while waiting for producer to flush outstanding 561 messages
ERROR  ||  WorkerSourceTask{id=connector-v1-0} Failed to commit offsets
INFO   MySQL|connector_v1|snapshot       Exported 202336041 of 259000000 records for table 'db.large' after 10:10:50.352
INFO   MySQL|connector_v1|snapshot       Finished exporting 202353026 records for table 'db.large'; total duration '10:10:53.191'
INFO   MySQL|connector_v1|snapshot  Exporting data from table 'db.small' (2 of 7 tables)
INFO   MySQL|connector_v1|snapshot       For table 'db.small' using select statement: 'SELECT `field1`, `field2`, `field3` FROM `db`.`small`'
INFO   MySQL|connector_v1|snapshot       Finished exporting 500 records for table 'db.small'; total duration '00:00:00.021'
INFO   MySQL|connector_v1|snapshot  Exporting data from table 'db.medium' (3 of 7 tables)
INFO   MySQL|connector_v1|snapshot       For table 'db.medium' using select statement: 'SELECT `field1`, `field2`, `field3`  FROM `db`.`medium`'
INFO   MySQL|connector_v1|snapshot       Exported 84873 of 14000000 records for table 'db.medium' after 00:00:10.006
INFO   MySQL|connector_v1|snapshot       Exported 170889 of 14000000 records for table 'db.medium' after 00:00:20.172
INFO   MySQL|connector_v1|snapshot       Exported 258953 of 14000000 records for table 'db.medium' after 00:00:30.267
INFO   MySQL|connector_v1|snapshot       Exported 349065 of 14000000 records for table 'db.medium' after 00:00:40.392

有什么想法吗? 谢谢

弄明白了 - 导出的消息数量实际上是正确的。

答案是 debezium 不使用这些日志中的实际消息计数,而是使用估计计数: https://github.com/debezium/debezium/blob/8d71080a9a8aac875e338964af417dc8de93dfcc/debezium-connector-mysql/src/main/java/io/debezium/connector/mysql/MySqlConnection.java#L427