Liferay 7,Hikari-Pool connection not available 错误,在生产环境中发生但不在预生产环境中

Liferay 7, Hikari-Pool connection not available error, happen on production but not on pre-production environment

首先,我有两个配置完全相同(IP 除外)的环境,每个环境都有两个虚拟机。一个在预生产中,一个在生产中(目前处于配置阶段)。有一个带有 liferay 7.0.6 tomcat 包的虚拟机(从社区安全团队的 7.0.6 累积补丁构建)和另一个带有 postgresql 9.4.26.

在预生产环境中一切正常。

在生产环境中,开始在 liferay i 中创建用户后几个小时 运行 出现此错误(最后是完整堆栈):

Caused by: java.sql.SQLTransientConnectionException: HikariPool-2 - Connection is not available, request timed out after 937980ms.
    at com.zaxxer.hikari.pool.HikariPool.createTimeoutException(HikariPool.java:591)
    at com.zaxxer.hikari.pool.HikariPool.getConnection(HikariPool.java:194)
    at com.zaxxer.hikari.pool.HikariPool.getConnection(HikariPool.java:146)
    at com.zaxxer.hikari.HikariDataSource.getConnection(HikariDataSource.java:112)
    at org.springframework.jdbc.datasource.LazyConnectionDataSourceProxy$LazyConnectionInvocationHandler.getTargetConnection(LazyConnectionDataSourceProxy.java:403)
    at org.springframework.jdbc.datasource.LazyConnectionDataSourceProxy$LazyConnectionInvocationHandler.invoke(LazyConnectionDataSourceProxy.java:376)
    at com.sun.proxy.$Proxy7.prepareStatement(Unknown Source)
    at org.hibernate.jdbc.AbstractBatcher.getPreparedStatement(AbstractBatcher.java:534)
    at org.hibernate.jdbc.AbstractBatcher.getPreparedStatement(AbstractBatcher.java:452)
    at org.hibernate.jdbc.AbstractBatcher.prepareQueryStatement(AbstractBatcher.java:161)
    at org.hibernate.loader.Loader.prepareQueryStatement(Loader.java:1700)
    at org.hibernate.loader.Loader.doQuery(Loader.java:801)
    at org.hibernate.loader.Loader.doQueryAndInitializeNonLazyCollections(Loader.java:274)
    at org.hibernate.loader.Loader.loadEntity(Loader.java:2037)
    ... 50 more 

所以我使用比较软件检查了我的预生产和生产中的 liferay 配置之间是否存在差异,除了 IP,我什么也没发现。与两种环境中的 postgresql 配置相同。

我还检查了 vm 之间的时间同步,它们都是通过 debian 池服务器上的 ntp 同步的。

数据库虚拟机:

n$ ntpq -p
     remote           refid      st t when poll reach   delay   offset  jitter
==============================================================================
 0.debian.pool.n .POOL.          16 p    -   64    0    0.000    0.000   0.002
 1.debian.pool.n .POOL.          16 p    -   64    0    0.000    0.000   0.002
 2.debian.pool.n .POOL.          16 p    -   64    0    0.000    0.000   0.002
 3.debian.pool.n .POOL.          16 p    -   64    0    0.000    0.000   0.002
+mail.klausen.dk 193.67.79.202    2 u  116 1024  377   14.435   -0.214   0.358
+any.time.nl     85.199.214.99    2 u  871 1024  377    1.666   -0.183   0.258
-rag.9t4.net     131.188.3.221    2 u  102 1024  377   16.491   -3.769   0.571
*ntp1.m-online.n 212.18.1.106     2 u  318 1024  377   16.608   -0.263   0.240
-tor-relais1.lin 131.188.3.223    2 u  199 1024  377   14.149    0.272   0.661
-www.kashra.com  .DCFp.           1 u  150 1024  377   22.623    1.126   0.816

和 Liferay 虚拟机:

$ ntpq -p
     remote           refid      st t when poll reach   delay   offset  jitter
==============================================================================
 0.debian.pool.n .POOL.          16 p    -   64    0    0.000    0.000   0.002
 1.debian.pool.n .POOL.          16 p    -   64    0    0.000    0.000   0.002
 2.debian.pool.n .POOL.          16 p    -   64    0    0.000    0.000   0.002
 3.debian.pool.n .POOL.          16 p    -   64    0    0.000    0.000   0.002
+stratum2-4.ntp. 129.70.137.82    2 u  200 1024  377   30.471    1.069   2.414
+138.201.16.225  131.188.3.221    2 u  696 1024  377   16.613    1.357   0.397
-kuehlich.com    131.188.3.221    2 u  373 1024  377   22.656    2.025   0.885
-time.cloudflare 10.21.8.19       3 u  566 1024  377    8.123   -0.640   0.277
*a.chl.la        131.188.3.222    2 u  167 1024  377   14.472    1.033   2.448
+195.50.171.101  145.253.3.52     2 u  266 1024  377   10.804    0.928   0.395

我还注意到 postgresql 日志中的一条错误行,关于未知 PID 上的进程取消请求恰好发生在 Liferay 超时错误之前的毫秒数(937980 毫秒):

 LOG: PID 1767 in the cancel request does not match any process

我尝试从头开始重新安装 Liferay,但没有任何变化。 它应该在预生产和生产之间存在差异,因为它在预生产中工作正常但我找不到它。

liferay 中的 HikariCP 配置是两种环境的默认设置

jdbc.default.connectionTimeout=30000
jdbc.default.driverClassName=org.postgresql.Driver
jdbc.default.idleConnectionTestPeriod=60
jdbc.default.idleTimeout=600000
jdbc.default.initialPoolSize=10
jdbc.default.liferay.pool.provider=hikaricp
jdbc.default.maxActive=100
jdbc.default.maxIdleTime=3600
jdbc.default.maxLifetime=0
jdbc.default.maxPoolSize=100
jdbc.default.maximumPoolSize=100
jdbc.default.minIdle=10
jdbc.default.minPoolSize=10
jdbc.default.minimumIdle=10

,postgresql 也一样:

max_connections = 100

来自 liferay 的 HikariPool full satck :

2021-06-30 00:12:32.397 ERROR [liferay/scheduler_dispatch-6][JDBCExceptionReporter:234] HikariPool-2 - Connection is not available, request timed out after 937980ms.
2021-06-30 00:12:32.401 ERROR [liferay/scheduler_dispatch-6][BasePersistenceImpl:264] Caught unexpected exception
com.liferay.portal.kernel.exception.SystemException: com.liferay.portal.kernel.dao.orm.ORMException: org.hibernate.exception.GenericJDBCException: could not load an entity: [com.liferay.counter.model.impl.CounterImpl#com.liferay.counter.kernel.model.Counter]
    at com.liferay.portal.kernel.service.persistence.impl.BasePersistenceImpl.processException(BasePersistenceImpl.java:270)
    at com.liferay.counter.service.persistence.impl.CounterFinderImpl._obtainIncrement(CounterFinderImpl.java:391)
    at com.liferay.counter.service.persistence.impl.CounterFinderImpl._competeIncrement(CounterFinderImpl.java:339)
    at com.liferay.counter.service.persistence.impl.CounterFinderImpl._competeIncrement(CounterFinderImpl.java:325)
    at com.liferay.counter.service.persistence.impl.CounterFinderImpl.increment(CounterFinderImpl.java:111)
    at com.liferay.counter.service.persistence.impl.CounterFinderImpl.increment(CounterFinderImpl.java:100)
    at com.liferay.counter.service.persistence.impl.CounterFinderImpl.increment(CounterFinderImpl.java:95)
    at com.liferay.counter.service.impl.CounterLocalServiceImpl.increment(CounterLocalServiceImpl.java:42)
    at sun.reflect.GeneratedMethodAccessor638.invoke(Unknown Source)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at com.liferay.portal.spring.aop.ServiceBeanMethodInvocation.proceed(ServiceBeanMethodInvocation.java:163)
    at com.liferay.portal.spring.transaction.CounterTransactionExecutor.execute(CounterTransactionExecutor.java:50)
    at com.liferay.portal.spring.transaction.TransactionInterceptor.invoke(TransactionInterceptor.java:58)
    at com.liferay.portal.spring.aop.ServiceBeanMethodInvocation.proceed(ServiceBeanMethodInvocation.java:137)
    at com.liferay.portal.spring.aop.ServiceBeanAopProxy.invoke(ServiceBeanAopProxy.java:169)
    at com.sun.proxy.$Proxy78.increment(Unknown Source)
    at com.liferay.counter.kernel.service.CounterLocalServiceUtil.increment(CounterLocalServiceUtil.java:238)
    at com.liferay.portal.kernel.systemevent.SystemEventHierarchyEntryThreadLocal.push(SystemEventHierarchyEntryThreadLocal.java:134)
    at com.liferay.portal.kernel.systemevent.SystemEventHierarchyEntryThreadLocal.push(SystemEventHierarchyEntryThreadLocal.java:96)
    at com.liferay.portal.repository.capabilities.TemporaryFileEntriesCapabilityImpl._runWithoutSystemEvents(TemporaryFileEntriesCapabilityImpl.java:313)
    at com.liferay.portal.repository.capabilities.TemporaryFileEntriesCapabilityImpl.deleteExpiredTemporaryFileEntries(TemporaryFileEntriesCapabilityImpl.java:113)
    at com.liferay.document.library.web.internal.messaging.TempFileEntriesMessageListener.deleteExpiredTemporaryFileEntries(TempFileEntriesMessageListener.java:111)
    at com.liferay.document.library.web.internal.messaging.TempFileEntriesMessageListener.performAction(TempFileEntriesMessageListener.java:134)
    at com.liferay.document.library.web.internal.messaging.TempFileEntriesMessageListener.performAction(TempFileEntriesMessageListener.java:130)
    at com.liferay.portal.kernel.dao.orm.DefaultActionableDynamicQuery.performAction(DefaultActionableDynamicQuery.java:405)
    at com.liferay.portal.kernel.dao.orm.DefaultActionableDynamicQuery.call(DefaultActionableDynamicQuery.java:315)
    at com.liferay.portal.kernel.dao.orm.DefaultActionableDynamicQuery.call(DefaultActionableDynamicQuery.java:277)
    at com.liferay.portal.kernel.dao.orm.DefaultActionableDynamicQuery.doPerformActions(DefaultActionableDynamicQuery.java:335)
    at com.liferay.portal.kernel.dao.orm.DefaultActionableDynamicQuery.performActions(DefaultActionableDynamicQuery.java:86)
    at com.liferay.document.library.web.internal.messaging.TempFileEntriesMessageListener.doReceive(TempFileEntriesMessageListener.java:139)
    at com.liferay.portal.kernel.messaging.BaseMessageListener.receive(BaseMessageListener.java:26)
    at com.liferay.portal.kernel.scheduler.messaging.SchedulerEventMessageListenerWrapper.receive(SchedulerEventMessageListenerWrapper.java:66)
    at com.liferay.portal.kernel.messaging.InvokerMessageListener.receive(InvokerMessageListener.java:74)
    at com.liferay.portal.kernel.messaging.ParallelDestination.run(ParallelDestination.java:52)
    at com.liferay.portal.kernel.concurrent.ThreadPoolExecutor$WorkerTask._runTask(ThreadPoolExecutor.java:756)
    at com.liferay.portal.kernel.concurrent.ThreadPoolExecutor$WorkerTask.run(ThreadPoolExecutor.java:667)
    at java.lang.Thread.run(Thread.java:748)
Caused by: com.liferay.portal.kernel.dao.orm.ORMException: org.hibernate.exception.GenericJDBCException: could not load an entity: [com.liferay.counter.model.impl.CounterImpl#com.liferay.counter.kernel.model.Counter]
    at com.liferay.portal.dao.orm.hibernate.ExceptionTranslator.translate(ExceptionTranslator.java:34)
    at com.liferay.portal.dao.orm.hibernate.SessionImpl.get(SessionImpl.java:205)
    at com.liferay.portal.kernel.dao.orm.ClassLoaderSession.get(ClassLoaderSession.java:326)
    at com.liferay.counter.service.persistence.impl.CounterFinderImpl._obtainIncrement(CounterFinderImpl.java:369)
    ... 36 more
Caused by: org.hibernate.exception.GenericJDBCException: could not load an entity: [com.liferay.counter.model.impl.CounterImpl#com.liferay.counter.kernel.model.Counter]
    at org.hibernate.exception.SQLStateConverter.handledNonSpecificException(SQLStateConverter.java:140)
    at org.hibernate.exception.SQLStateConverter.convert(SQLStateConverter.java:128)
    at org.hibernate.exception.JDBCExceptionHelper.convert(JDBCExceptionHelper.java:66)
    at org.hibernate.loader.Loader.loadEntity(Loader.java:2041)
    at org.hibernate.loader.entity.AbstractEntityLoader.load(AbstractEntityLoader.java:86)
    at org.hibernate.loader.entity.AbstractEntityLoader.load(AbstractEntityLoader.java:76)
    at org.hibernate.persister.entity.AbstractEntityPersister.load(AbstractEntityPersister.java:3294)
    at org.hibernate.event.def.DefaultLoadEventListener.loadFromDatasource(DefaultLoadEventListener.java:496)
    at org.hibernate.event.def.DefaultLoadEventListener.doLoad(DefaultLoadEventListener.java:477)
    at org.hibernate.event.def.DefaultLoadEventListener.load(DefaultLoadEventListener.java:227)
    at org.hibernate.event.def.DefaultLoadEventListener.lockAndLoad(DefaultLoadEventListener.java:403)
    at org.hibernate.event.def.DefaultLoadEventListener.onLoad(DefaultLoadEventListener.java:155)
    at org.hibernate.impl.SessionImpl.fireLoad(SessionImpl.java:1090)
    at org.hibernate.impl.SessionImpl.get(SessionImpl.java:1075)
    at org.hibernate.impl.SessionImpl.get(SessionImpl.java:1066)
    at com.liferay.portal.dao.orm.hibernate.SessionImpl.get(SessionImpl.java:201)
    ... 38 more
Caused by: java.sql.SQLTransientConnectionException: HikariPool-2 - Connection is not available, request timed out after 937980ms.
    at com.zaxxer.hikari.pool.HikariPool.createTimeoutException(HikariPool.java:591)
    at com.zaxxer.hikari.pool.HikariPool.getConnection(HikariPool.java:194)
    at com.zaxxer.hikari.pool.HikariPool.getConnection(HikariPool.java:146)
    at com.zaxxer.hikari.HikariDataSource.getConnection(HikariDataSource.java:112)
    at org.springframework.jdbc.datasource.LazyConnectionDataSourceProxy$LazyConnectionInvocationHandler.getTargetConnection(LazyConnectionDataSourceProxy.java:403)
    at org.springframework.jdbc.datasource.LazyConnectionDataSourceProxy$LazyConnectionInvocationHandler.invoke(LazyConnectionDataSourceProxy.java:376)
    at com.sun.proxy.$Proxy7.prepareStatement(Unknown Source)
    at org.hibernate.jdbc.AbstractBatcher.getPreparedStatement(AbstractBatcher.java:534)
    at org.hibernate.jdbc.AbstractBatcher.getPreparedStatement(AbstractBatcher.java:452)
    at org.hibernate.jdbc.AbstractBatcher.prepareQueryStatement(AbstractBatcher.java:161)
    at org.hibernate.loader.Loader.prepareQueryStatement(Loader.java:1700)
    at org.hibernate.loader.Loader.doQuery(Loader.java:801)
    at org.hibernate.loader.Loader.doQueryAndInitializeNonLazyCollections(Loader.java:274)
    at org.hibernate.loader.Loader.loadEntity(Loader.java:2037)
    ... 50 more

实际上我没有其他想法可以测试,任何帮助将不胜感激。

我们终于找到了解决方案。

在预生产环境中,两个虚拟机位于同一 VLAN 上,而在生产环境中并非如此。

解决方法:将虚拟机放在同一个VLAN上解决问题。