损坏的 MySQL GTID 复制(未对齐的 GTID)

Broken MySQL GTID replication (malaligned GTIDs)

在 Debian 8 上使用 Percona MySQL 5.6 和 sql_slave_parallel_workers=5。有时 GTID 复制会中断,我不知道为什么。我认为 GTID 是按连续顺序执行的,但是在查看状态时

*************************** 1. row ***************************
               Slave_IO_State: Waiting for master to send event
                  Master_Host: d22.local
                  Master_User: xyz
                  Master_Port: 3306
                Connect_Retry: 60
              Master_Log_File: mysql-bin.039232
          Read_Master_Log_Pos: 219044
               Relay_Log_File: mysqld-relay-bin.072392
                Relay_Log_Pos: 90640
        Relay_Master_Log_File: mysql-bin.036196
             Slave_IO_Running: Yes
            Slave_SQL_Running: No
              Replicate_Do_DB:
          Replicate_Ignore_DB: xyz_etl
           Replicate_Do_Table:
       Replicate_Ignore_Table:
      Replicate_Wild_Do_Table:
  Replicate_Wild_Ignore_Table:
                   Last_Errno: 1032
                   Last_Error: Could not execute Update_rows event on table xyz.sessions; Can't find record in 'sessions', Error_code: 1032; handler error HA_ERR_KEY_NOT_FOUND; the event's master log mysql-bin.036196, end_log_pos 78709552
                 Skip_Counter: 0
          Exec_Master_Log_Pos: 78708927
              Relay_Log_Space: 1337994488
              Until_Condition: None
               Until_Log_File:
                Until_Log_Pos: 0
           Master_SSL_Allowed: No
           Master_SSL_CA_File:
           Master_SSL_CA_Path:
              Master_SSL_Cert:
            Master_SSL_Cipher:
               Master_SSL_Key:
        Seconds_Behind_Master: NULL
Master_SSL_Verify_Server_Cert: No
                Last_IO_Errno: 0
                Last_IO_Error:
               Last_SQL_Errno: 1032
               Last_SQL_Error: Could not execute Update_rows event on table xyz.sessions; Can't find record in 'sessions', Error_code: 1032; handler error HA_ERR_KEY_NOT_FOUND; the event's master log mysql-bin.036196, end_log_pos 78709552
  Replicate_Ignore_Server_Ids:
             Master_Server_Id: 22
                  Master_UUID: 0e7b97a8-a689-11e5-8b79-901b0e8b0f53
             Master_Info_File: /var/lib/mysql/master.info
                    SQL_Delay: 0
          SQL_Remaining_Delay: NULL
      Slave_SQL_Running_State:
           Master_Retry_Count: 86400
                  Master_Bind:
      Last_IO_Error_Timestamp:
     Last_SQL_Error_Timestamp: 161219 20:32:20
               Master_SSL_Crl:
           Master_SSL_Crlpath:
           Retrieved_Gtid_Set: 0e7b97a8-a689-11e5-8b79-901b0e8b0f53:60397-45157441
            Executed_Gtid_Set: 0e7b97a8-a689-11e5-8b79-901b0e8b0f53:1-42679868:42679870-42679876:42679878-42679879:42679881-42679890:42679892-42679908:42679910:42679913:42679916-42679917:42679919-42679927:42679929-42679932:42679934:42679936:42679938-42679939:42679944:42679946-42679950:42679952-42679955:42679957-42679964:42679966:42679969-42679970:42679972:42679974-42679977:42679979-42679980:42679984-42679986:42679988-42679990:42679994-42679996:42679998:42680000-42680001:42680003-42680006:42680009-42680011:42680013-42680018:42680021:42680024:42680026:42680030:42680032:42680035:42680038,
aea3618e-bacf-11e6-9506-b8ca3a67f830:1-10937274
                Auto_Position: 1
1 row in set (0.00 sec)

我有点困惑。 sql_slave_parallel_workers 现在设置为 0。但是上面声称的错误是 GTID 42679909 而不是预期的 42679868 。这是什么原因。解决上述损坏复制的正确步骤是什么? 我不明白的是,从理论上讲,GTID 42679869 的事务可以毫无问题地执行。但是做 STOP SLAVE; START SLAVE; 不处理它们?!

为了回答这个问题并帮助其他人,这里是我完成的步骤:

  • 设置slave_parallel_workers=0
  • 只需要关注字段 Executed_Gtid_Set 并用 STOP SLAVE; SET GTID_NEXT="[...]"; BEGIN; COMMIT; SET GTID_NEXT="AUTOMATIC"; START SLAVE;
  • 一个接一个地处理 GTID 列表中的所有空白
  • 到达点时,复制将自动继续而不会出错,将 slave_parallel_workers 设置为以前的值