liquibase 无法更新数据库 and/or 非常慢

liquibase fails to update DB and/or is very slow

当我们使用 Liquibase 执行以下变更集时,我们遇到了一个非常奇怪的问题:

<databaseChangeLog xmlns="http://www.liquibase.org/xml/ns/dbchangelog"
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xsi:schemaLocation="http://www.liquibase.org/xml/ns/dbchangelog http://www.liquibase.org/xml/ns/dbchangelog/dbchangelog-3.3.xsd">

    <changeSet id="add companyinfo.identifier" author="domi">
        <addColumn tableName="companyinfo">
            <column name="identifier" type="VARCHAR(50)" />
        </addColumn>
    </changeSet>

    <changeSet id="add uk companyinfo.identifier" author="domi">
        <addUniqueConstraint    columnNames="identifier" 
                                tableName="companyinfo" 
                                constraintName="uk_companyinfo_identifier"/>
    </changeSet>

</databaseChangeLog>

变更集将新列 (identifier) 添加到 table companyinfo 并在同一列上定义唯一约束。 table 有大约 1'500'000 条记录。

这些变更集是针对 MySQL(版本 5.5.54)执行的,由 Jenkins 管道脚本触发 org.liquibase:liquibase-maven-plugin:3.5.3:update maven plugin/goal。 (Jenkins 和 DB 位于美国东海岸)。 我们使用的 JDBC 驱动程序是: mysql:mysql-connector-java:5.1.41

通常我们对 liquibase 没有任何问题,但是当我们运行这个变更集时,我们总是在正好 2 小时后得到这样的错误:

[INFO] --- liquibase-maven-plugin:3.5.3:update (default) @ persistence ---
[INFO] ------------------------------------------------------------------------
[INFO] Executing on Database: jdbc:mysql://yyyy.xxxx.net:3306/my_db?characterEncoding=utf8
INFO 5/2/17 5:11 PM: liquibase: Successfully acquired change log lock
INFO 5/2/17 5:11 PM: liquibase: Reading from DATABASECHANGELOG
SEVERE 5/2/17 7:11 PM: liquibase: src/main/resources/db/db.changelog-master.xml: db/changelog/db.changelog-companyinfo_identifier.xml::add companyinfo.identifier::domi: Change Set db/changelog/db.changelog-companyinfo_identifier.xml::add add companyinfo.identifier::domi failed.  Error: Communications link failure

The last packet successfully received from the server was 7,200,055 milliseconds ago.  The last packet sent successfully to the server was 7,200,054 milliseconds ago. [Failed SQL: ALTER TABLE companyinfo ADD identifier VARCHAR(50) NULL]
INFO 5/2/17 7:11 PM: liquibase: db/changelog/db.changelog-companyinfo_identifier.xml::add companyinfo.identifier::domi: Successfully released change log lock
SEVERE 5/2/17 7:11 PM: liquibase: db/changelog/db.changelog-companyinfo_identifier.xml::add companyinfo.identifier::domi: Could not release lock
liquibase.exception.LockException: liquibase.exception.DatabaseException: Error executing SQL UPDATE DATABASECHANGELOGLOCK SET LOCKED = 0, LOCKEDBY = NULL, LOCKGRANTED = NULL WHERE ID = 1: No operations allowed after connection closed.
    at liquibase.lockservice.StandardLockService.releaseLock(StandardLockService.java:283)
    at liquibase.Liquibase.update(Liquibase.java:218)
    at liquibase.Liquibase.update(Liquibase.java:192)
    at liquibase.Liquibase.update(Liquibase.java:335)
    at org.liquibase.maven.plugins.LiquibaseUpdate.doUpdate(LiquibaseUpdate.java:33)
    at org.liquibase.maven.plugins.AbstractLiquibaseUpdateMojo.performLiquibaseTask(AbstractLiquibaseUpdateMojo.java:30)
    at org.liquibase.maven.plugins.AbstractLiquibaseMojo.execute(AbstractLiquibaseMojo.java:394)
    at org.apache.maven.plugin.DefaultBuildPluginManager.executeMojo(DefaultBuildPluginManager.java:134)
    at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:208)
    at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:153)
    at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:145)
    at org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:116)
    at org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:80)
    at org.apache.maven.lifecycle.internal.builder.singlethreaded.SingleThreadedBuilder.build(SingleThreadedBuilder.java:51)
    at org.apache.maven.lifecycle.internal.LifecycleStarter.execute(LifecycleStarter.java:128)
    at org.apache.maven.DefaultMaven.doExecute(DefaultMaven.java:307)
    at org.apache.maven.DefaultMaven.doExecute(DefaultMaven.java:193)
    at org.apache.maven.DefaultMaven.execute(DefaultMaven.java:106)
    at org.apache.maven.cli.MavenCli.execute(MavenCli.java:862)
    at org.apache.maven.cli.MavenCli.doMain(MavenCli.java:286)
    at org.apache.maven.cli.MavenCli.main(MavenCli.java:197)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.codehaus.plexus.classworlds.launcher.Launcher.launchEnhanced(Launcher.java:289)
    at org.codehaus.plexus.classworlds.launcher.Launcher.launch(Launcher.java:229)
    at org.codehaus.plexus.classworlds.launcher.Launcher.mainWithExitCode(Launcher.java:415)
    at org.codehaus.plexus.classworlds.launcher.Launcher.main(Launcher.java:356)
Caused by: liquibase.exception.DatabaseException: Error executing SQL UPDATE DATABASECHANGELOGLOCK SET LOCKED = 0, LOCKEDBY = NULL, LOCKGRANTED = NULL WHERE ID = 1: No operations allowed after connection closed.
    at liquibase.executor.jvm.JdbcExecutor.execute(JdbcExecutor.java:68)
    at liquibase.executor.jvm.JdbcExecutor.update(JdbcExecutor.java:231)
    at liquibase.executor.jvm.JdbcExecutor.update(JdbcExecutor.java:205)
    at liquibase.lockservice.StandardLockService.releaseLock(StandardLockService.java:267)
    ... 28 more
Caused by: com.mysql.jdbc.exceptions.jdbc4.MySQLNonTransientConnectionException: No operations allowed after connection closed.
    at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
    at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
    at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
    at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
    at com.mysql.jdbc.Util.handleNewInstance(Util.java:425)
    at com.mysql.jdbc.Util.getInstance(Util.java:408)
    at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:918)
    at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:897)
    at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:886)
    at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:860)
    at com.mysql.jdbc.ConnectionImpl.throwConnectionClosedException(ConnectionImpl.java:1198)
    at com.mysql.jdbc.ConnectionImpl.checkClosed(ConnectionImpl.java:1193)
    at com.mysql.jdbc.ConnectionImpl.createStatement(ConnectionImpl.java:2388)
    at com.mysql.jdbc.ConnectionImpl.createStatement(ConnectionImpl.java:2372)
    at liquibase.executor.jvm.JdbcExecutor.execute(JdbcExecutor.java:52)
    ... 31 more
Caused by: com.mysql.jdbc.exceptions.jdbc4.CommunicationsException: Communications link failure

The last packet successfully received from the server was 7,200,055 milliseconds ago.  The last packet sent successfully to the server was 7,200,054 milliseconds ago.
    at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
    at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
    at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
    at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
    at com.mysql.jdbc.Util.handleNewInstance(Util.java:425)

当我 运行 从我的笔记本电脑执行与 Liquibase 相同的命令时,不是通过 JDBC 而是从 SequelPro MySql 客户端连接到与上面相同的数据库(客户端在瑞士,DB 在美国东海岸),它在大约 15 分钟内顺利完成,没有任何问题。

这些是 Liquibase 执行的语句:

--  *********************************************************************
--  Update Database Script
--  *********************************************************************
--  Change Log: src/main/resources/db/db.changelog-master.xml
--  Ran at: 5/3/17 7:55 AM
--  Against: XXXX@CCCCC@jdbc:mysql://yyyy.xxxx.net:3306/my_db?characterEncoding=utf8
--  Liquibase version: 3.5.3
--  *********************************************************************

--  Lock Database
UPDATE DATABASECHANGELOGLOCK SET LOCKED = 1, LOCKEDBY = 'xxxxx (192.168.1.24)', LOCKGRANTED = '2017-05-03 07:55:44.564' WHERE ID = 1 AND LOCKED = 0;

--  Changeset db/changelog/db.changelog-companyinfo_identifier.xml::add companyinfo.identifier::domi
ALTER TABLE companyinfo ADD identifier VARCHAR(50) NULL;

INSERT INTO DATABASECHANGELOG (ID, AUTHOR, FILENAME, DATEEXECUTED, ORDEREXECUTED, MD5SUM, DESCRIPTION, COMMENTS, EXECTYPE, CONTEXTS, LABELS, LIQUIBASE, DEPLOYMENT_ID) VALUES ('add companyinfo.identifier', 'domi', 'db/changelog/db.changelog-companyinfo_identifier.xml', NOW(), 838, '7:b2d3082917bf3ff3aecb6cbc363a5e9c', 'add companyinfo.identifier', '', 'EXECUTED', NULL, NULL, '3.5.3', '3790945685');

--  Changeset db/changelog/db.changelog-companyinfo_identifier.xml::add uk companyinfo.identifier::domi
ALTER TABLE companyinfo ADD CONSTRAINT uk_companyinfo_identifier UNIQUE (identifier);

INSERT INTO DATABASECHANGELOG (ID, AUTHOR, FILENAME, DATEEXECUTED, ORDEREXECUTED, MD5SUM, DESCRIPTION, COMMENTS, EXECTYPE, CONTEXTS, LABELS, LIQUIBASE, DEPLOYMENT_ID) VALUES ('add uk companyinfo.identifier', 'domi', 'db/changelog/db.changelog-companyinfo_identifier.xml', NOW(), 839, '7:5d98affa45f814b9ad32bc9c954ed32b', 'addUniqueConstraint constraintName=uk_companyinfo_identifier, tableName=companyinfo', '', 'EXECUTED', NULL, NULL, '3.5.3', '3790945685');

--  Release Database Lock
UPDATE DATABASECHANGELOGLOCK SET LOCKED = 0, LOCKEDBY = NULL, LOCKGRANTED = NULL WHERE ID = 1;

据我所知,数据库没有设置与 2 小时限制匹配的超时:

我觉得有两个问题很有意思:

...但也许两个问题的答案相同


更新:我现在从我的本地环境(也连接到有问题的同一个远程数据库)执行了我从 Jenkins 触发的完全相同的 maven/liquibase 命令 - 你猜怎么着:它就像一个魅力,完全没有问题 - 命令在大约 20 分钟内完成!

mvn -f pom.xml process-resources -Pupdate-db -Dliquibase.username=xxx -Dliquibase.password=xxxx -Dliquibase.url=jdbc:mysql://yyyy.xxxx.net:3306/my_db:3306/yooture_ci?characterEncoding=utf8

所以我最后的猜测是我们的 CI 基础设施和导致问题的数据库之间一定有什么东西,但是是什么?它总是有效,只是在这种情况下不...

我们终于找到问题了...

问题是由默认的 AWS NAT 配置引起的。 The AWS documentation 说:

If a connection that's using a NAT gateway is idle for 5 minutes or more, the connection times out

它真的很低,它肯定会破坏大多数操作系统的默认配置。