Tomcat java 应用程序线程饥饿 - 所有线程都在等待 c3p0 BasicResourcePool.awaitAvailable

Thread starvation on Tomcat java application - all threads in WAITING for c3p0 BasicResourcePool.awaitAvailable

我们的 java Web 应用程序当前遇到随机(并且以前从未遇到过)阻塞问题,这是由创建的线程突然激增以及随后耗尽 Tomcat 的最大线程数引起的可以创建(默认 = 200)。

最后一个峰值在 10 分钟内创建了 120 个新线程,使计数从 80 增加到 200,并阻止了应用程序。通常的线程数是 70-80。

进行线程转储时,所有这些线程都处于 WAITING 状态,似乎所有线程都在等待从 c3p0 到 DBMS 的新连接。

这是一个示例堆栈:

"http-nio-8443-exec-77  Waiting Thread ID: 10016","1"

"java.lang.Object.wait(long)","2"
"com.mchange.v2.resourcepool.BasicResourcePool.awaitAvailable(long) BasicResourcePool.java:1414","2"
"com.mchange.v2.resourcepool.BasicResourcePool.prelimCheckoutResource(long) BasicResourcePool.java:606","2"
"com.mchange.v2.resourcepool.BasicResourcePool.checkoutResource(long) BasicResourcePool.java:526","2"
"com.mchange.v2.c3p0.impl.C3P0PooledConnectionPool.checkoutAndMarkConnectionInUse() C3P0PooledConnectionPool.java:755","2"
"com.mchange.v2.c3p0.impl.C3P0PooledConnectionPool.checkoutPooledConnection() C3P0PooledConnectionPool.java:682","2"
"com.mchange.v2.c3p0.impl.AbstractPoolBackedDataSource.getConnection() AbstractPoolBackedDataSource.java:140","2"
"org.springframework.jdbc.datasource.lookup.AbstractRoutingDataSource.getConnection() AbstractRoutingDataSource.java:164","2"
"org.hibernate.engine.jdbc.connections.internal.DatasourceConnectionProviderImpl.getConnection() DatasourceConnectionProviderImpl.java:139","2"
"org.hibernate.internal.AbstractSessionImpl$NonContextualJdbcConnectionAccess.obtainConnection() AbstractSessionImpl.java:380","2"
"org.hibernate.engine.jdbc.internal.LogicalConnectionImpl.obtainConnection() LogicalConnectionImpl.java:228","2"
"org.hibernate.engine.jdbc.internal.LogicalConnectionImpl.getConnection() LogicalConnectionImpl.java:171","2"
"org.hibernate.internal.SessionImpl.connection() SessionImpl.java:450","2"
"org.springframework.orm.hibernate4.HibernateTransactionManager.doBegin(Object, TransactionDefinition) HibernateTransactionManager.java:450","2"
"org.springframework.transaction.support.AbstractPlatformTransactionManager.getTransaction(TransactionDefinition) AbstractPlatformTransactionManager.java:373","2"
"org.springframework.transaction.interceptor.TransactionAspectSupport.createTransactionIfNecessary(PlatformTransactionManager, TransactionAttribute, String) TransactionAspectSupport.java:463","2"
"org.springframework.transaction.interceptor.TransactionAspectSupport.invokeWithinTransaction(Method, Class, TransactionAspectSupport$InvocationCallback) TransactionAspectSupport.java:276","2"
"org.springframework.transaction.interceptor.TransactionInterceptor.invoke(MethodInvocation) TransactionInterceptor.java:96","2"
"org.springframework.aop.framework.ReflectiveMethodInvocation.proceed() ReflectiveMethodInvocation.java:179","2"
"org.springframework.aop.interceptor.ExposeInvocationInterceptor.invoke(MethodInvocation) ExposeInvocationInterceptor.java:92","2"
"org.springframework.aop.framework.ReflectiveMethodInvocation.proceed() ReflectiveMethodInvocation.java:179","2"
"org.springframework.aop.framework.CglibAopProxy$DynamicAdvisedInterceptor.intercept(Object, Method, Object[], MethodProxy) CglibAopProxy.java:653","2"

这是我们的 c3p0 配置 DataSource:

<property name="maxIdleTime" value="600" />
<property name="maxPoolSize" value="300" />
<property name="maxStatements" value="200" />
<property name="maxStatementsPerConnection" value="10" />
<property name="minPoolSize" value="15" />
<property name="idleConnectionTestPeriod" value="100" />
<property name="acquireIncrement" value="3" />
<property name="numHelperThreads" value="20" />

我目前正在尝试添加一个超时参数来帮助我调试这些情况

<property name="checkoutTimeout" value="30" />

我读到我应该使用 debugUnreturnedConnectionStackTracesunreturnedConnectionTimeout 来调试更好的这类问题,但我目前对它们不是很了解并且不敢在生产中使用它们。

是否有任何其他类型的配置可以帮助我调试问题?其他一些超时配置?

堆栈:

编辑(检查超时测试)

我试图在有问题的数据库上输入参数 checkoutTimeout:

<property name="checkoutTimeout" value="30" />

这 属性 应该强制所有正在等待从 c3p0 签出连接的线程在 30 秒后放弃,以防池耗尽,并抛出 Sql异常。

将此 属性 投入生产后,我们开始记录此类错误:

Caused by: org.hibernate.exception.GenericJDBCException: Could not open connection

    at org.hibernate.exception.internal.StandardSQLExceptionConverter.convert(StandardSQLExceptionConverter.java:54)

    at org.hibernate.engine.jdbc.spi.SqlExceptionHelper.convert(SqlExceptionHelper.java:126)

    at org.hibernate.engine.jdbc.spi.SqlExceptionHelper.convert(SqlExceptionHelper.java:112)

    at org.hibernate.engine.jdbc.internal.LogicalConnectionImpl.obtainConnection(LogicalConnectionImpl.java:235)

    at org.hibernate.engine.jdbc.internal.LogicalConnectionImpl.getConnection(LogicalConnectionImpl.java:171)

    at org.hibernate.internal.SessionImpl.connection(SessionImpl.java:450)

    at org.springframework.orm.hibernate4.HibernateTransactionManager.doBegin(HibernateTransactionManager.java:450)

    ... 119 more

Caused by: java.sql.SQLException: An attempt by a client to checkout a Connection has timed out.

    at com.mchange.v2.sql.SqlUtils.toSQLException(SqlUtils.java:118)

    at com.mchange.v2.sql.SqlUtils.toSQLException(SqlUtils.java:77)

    at com.mchange.v2.c3p0.impl.C3P0PooledConnectionPool.checkoutPooledConnection(C3P0PooledConnectionPool.java:687)

    at com.mchange.v2.c3p0.impl.AbstractPoolBackedDataSource.getConnection(AbstractPoolBackedDataSource.java:140)

    at org.springframework.jdbc.datasource.lookup.AbstractRoutingDataSource.getConnection(AbstractRoutingDataSource.java:164)

    at org.hibernate.engine.jdbc.connections.internal.DatasourceConnectionProviderImpl.getConnection(DatasourceConnectionProviderImpl.java:139)

    at org.hibernate.internal.AbstractSessionImpl$NonContextualJdbcConnectionAccess.obtainConnection(AbstractSessionImpl.java:380)

    at org.hibernate.engine.jdbc.internal.LogicalConnectionImpl.obtainConnection(LogicalConnectionImpl.java:228)

    ... 122 more

Caused by: com.mchange.v2.resourcepool.TimeoutException: A client timed out while waiting to acquire a resource from com.mchange.v2.resourcepool.BasicResourcePool@14e83e8f -- timeout at awaitAvailable()

    at com.mchange.v2.resourcepool.BasicResourcePool.awaitAvailable(BasicResourcePool.java:1416)

    at com.mchange.v2.resourcepool.BasicResourcePool.prelimCheckoutResource(BasicResourcePool.java:606)

    at com.mchange.v2.resourcepool.BasicResourcePool.checkoutResource(BasicResourcePool.java:526)

    at com.mchange.v2.c3p0.impl.C3P0PooledConnectionPool.checkoutAndMarkConnectionInUse(C3P0PooledConnectionPool.java:755)

    at com.mchange.v2.c3p0.impl.C3P0PooledConnectionPool.checkoutPooledConnection(C3P0PooledConnectionPool.java:682)

    ... 127 more

所以我想这证明了 out pool 已经耗尽(maxPoolSize 是 300)。对吗?

编辑 2:问题已解决,请查看我的答案

您可以要求您的 Sql 服务器 DBA 提供一些统计数据,包括在特定时间段内绑定到您的应用程序的会话数、执行所花费的时间 SQL 等。

我们能够通过 JMX 的帮助和 c3p0 数据源状态的监控解决问题。

使用 JVisualVM(或者如果您没有 Oracle JDK,您也可以使用 JConsole)我们查看了数据源的状态,显示如下:

我们的池大小是 300,这意味着 c3p0 在任何给定时间最多可以有 300 个连接可用,并且已经有 95 个处于忙碌状态,而且这个数字还在上升。

经过反复试验,我们知道这个问题只发生在特定客户身上,我们开始测试该客户使用的新功能(问题最近才出现),我们发现了一个特定的用例与我们一起使用重现问题,不断泄漏连接。

在代码库中,问题是这个用例打开了一个会话,但从未注意关闭它。

事情是这样的:

this.sessionFactory.withOptions().noInterceptor().openSession();

我们添加了会话的关闭,问题已解决

session.flush();
session.close();