当与 IBM MQ 的连接出现问题时,如何解决 Apache Camel 中的 JVM 挂起问题?

How to solve the JVM hung issue in Apache Camel when there is a connection issue to IBM MQ?

每当 IBM MQ 出现 network/connection 问题时,我们发现 JVM 经常挂在基于 Apache camel 的应用程序中。

记录器明确指出发生了连接问题,Spring CachingConnectionFactory 正在尝试重置底层 MQ 连接。重置连接时,Spring 和 IBM MQ Lib 之间似乎存在切换问题。

Jul 13, 2018 8:51:48 PM org.springframework.jms.connection.CachingConnectionFactory onException
WARNING: Encountered a JMSException - resetting the underlying JMS Connection
com.ibm.msg.client.jms.DetailedJMSException: JMSWMQ1107: A problem with this connection has occurred.
An error has occurred with the IBM MQ JMS connection.
Caused by: com.ibm.mq.MQException: JMSCMQ0001: IBM MQ call failed with compcode '2' ('MQCC_FAILED') reason '2009' ('MQRC_CONNECTION_BROKEN').
    at com.ibm.msg.client.wmq.common.internal.Reason.createException(Reason.java:203)
    ... 220 more
Caused by: com.ibm.mq.jmqi.JmqiException: CC=2;RC=2009

在完全相同的时间戳,JVM 挂起并且 DMLC 不再处理消息。但我确实看到消费者队列中有 20 个听众。

我对进程进行了线程转储,发现 hung/blocked 个导致 JVM 挂起的线程。

这是由于线程阻塞而正在等待的 JMSCCThreadPoolWorker 堆栈跟踪。

JMSCCThreadPoolWorker-727742 
Stack Trace is: 
java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x00000006d75964c0> (a java.util.concurrent.locks.ReentrantReadWriteLock$FairSync)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199)
at java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.lock(ReentrantReadWriteLock.java:943)
at com.ibm.msg.client.jms.internal.JmsSessionImpl$ReentrantDoubleLock.getExclusiveLock(JmsSessionImpl.java:4931)
at com.ibm.msg.client.jms.internal.JmsSessionImpl.stop(JmsSessionImpl.java:2521)
at com.ibm.msg.client.jms.internal.JmsSessionImpl.stop(JmsSessionImpl.java:2498)
at com.ibm.msg.client.jms.internal.JmsConnectionImpl.stop(JmsConnectionImpl.java:1263)
- locked <0x00000006ca146118> (a com.ibm.msg.client.jms.internal.State)
at com.ibm.mq.jms.MQConnection.stop(MQConnection.java:473)
at org.springframework.jms.connection.SingleConnectionFactory.closeConnection(SingleConnectionFactory.java:452)
at org.springframework.jms.connection.SingleConnectionFactory.resetConnection(SingleConnectionFactory.java:345)
- locked <0x00000006cfba30c8> (a java.lang.Object)
at org.springframework.jms.connection.CachingConnectionFactory.resetConnection(CachingConnectionFactory.java:207)
at org.springframework.jms.connection.SingleConnectionFactory.onException(SingleConnectionFactory.java:323)
at org.springframework.jms.connection.SingleConnectionFactory$AggregatedExceptionListener.onException(SingleConnectionFactory.java:673)
- locked <0x00000006cfba30c8> (a java.lang.Object)
at com.ibm.msg.client.jms.internal.JmsProviderExceptionListener.run(JmsProviderExceptionListener.java:413)
at com.ibm.msg.client.commonservices.workqueue.WorkQueueItem.runTask(WorkQueueItem.java:319)
at com.ibm.msg.client.commonservices.workqueue.SimpleWorkQueueItem.runItem(SimpleWorkQueueItem.java:99)
at com.ibm.msg.client.commonservices.workqueue.WorkQueueItem.run(WorkQueueItem.java:343)
at com.ibm.msg.client.commonservices.workqueue.WorkQueueManager.runWorkQueueItem(WorkQueueManager.java:312)
at com.ibm.msg.client.commonservices.j2se.workqueue.WorkQueueManagerImplementation$ThreadPoolWorker.run(WorkQueueManagerImplementation.java:1227)
Locked ownable synchronizers:
- <0x00000006ca05eb40> (a java.util.concurrent.locks.ReentrantLock$NonfairSync)

这是阻塞线程的堆栈跟踪:

Stack Trace is: 
java.lang.Thread.State: BLOCKED (on object monitor)
at org.springframework.jms.connection.SingleConnectionFactory.getConnection(SingleConnectionFactory.java:281)
- waiting to lock <0x00000006cfba30c8> (a java.lang.Object)
at org.springframework.jms.connection.SingleConnectionFactory.createConnection(SingleConnectionFactory.java:224)
at org.springframework.jms.connection.JmsTransactionManager.createConnection(JmsTransactionManager.java:288)
at org.springframework.jms.connection.JmsTransactionManager.doBegin(JmsTransactionManager.java:186)
at org.springframework.transaction.support.AbstractPlatformTransactionManager.handleExistingTransaction(AbstractPlatformTransactionManager.java:429)
at org.springframework.transaction.support.AbstractPlatformTransactionManager.getTransaction(AbstractPlatformTransactionManager.java:349)
at org.springframework.transaction.support.TransactionTemplate.execute(TransactionTemplate.java:130)
at org.apache.camel.spring.spi.TransactionErrorHandler.doInTransactionTemplate(TransactionErrorHandler.java:176)
at org.apache.camel.spring.spi.TransactionErrorHandler.processInTransaction(TransactionErrorHandler.java:136)
at org.apache.camel.spring.spi.TransactionErrorHandler.process(TransactionErrorHandler.java:105)
at org.apache.camel.processor.CamelInternalProcessor.process(CamelInternalProcessor.java:172)
at org.apache.camel.processor.DelegateAsyncProcessor.process(DelegateAsyncProcessor.java:97)
at org.apache.camel.processor.CamelInternalProcessor.process(CamelInternalProcessor.java:172)
at org.apache.camel.component.direct.DirectProducer.process(DirectProducer.java:62)
at org.apache.camel.processor.SendProcessor.process(SendProcessor.java:145)
at org.apache.camel.management.InstrumentationProcessor.process(InstrumentationProcessor.java:77)
at org.apache.camel.processor.interceptor.TraceInterceptor.process(TraceInterceptor.java:163)
at org.apache.camel.processor.DelegateAsyncProcessor.process(DelegateAsyncProcessor.java:97)
at org.apache.camel.processor.CamelInternalProcessor.process(CamelInternalProcessor.java:172)
at org.apache.camel.processor.Pipeline.process(Pipeline.java:120)
at org.apache.camel.processor.Pipeline.process(Pipeline.java:83)
at org.apache.camel.processor.FatalFallbackErrorHandler.process(FatalFallbackErrorHandler.java:81)
at org.apache.camel.processor.RedeliveryErrorHandler.deliverToFailureProcessor(RedeliveryErrorHandler.java:1057)
at org.apache.camel.processor.RedeliveryErrorHandler.process(RedeliveryErrorHandler.java:468)
at org.apache.camel.spring.spi.TransactionErrorHandler.processByErrorHandler(TransactionErrorHandler.java:220)
at org.apache.camel.spring.spi.TransactionErrorHandler.doInTransactionWithoutResult(TransactionErrorHandler.java:183)
at org.springframework.transaction.support.TransactionCallbackWithoutResult.doInTransaction(TransactionCallbackWithoutResult.java:34)
at org.springframework.transaction.support.TransactionTemplate.execute(TransactionTemplate.java:133)
at org.apache.camel.spring.spi.TransactionErrorHandler.doInTransactionTemplate(TransactionErrorHandler.java:176)
at org.apache.camel.spring.spi.TransactionErrorHandler.processInTransaction(TransactionErrorHandler.java:136)
at org.apache.camel.spring.spi.TransactionErrorHandler.process(TransactionErrorHandler.java:105)
at org.apache.camel.processor.CamelInternalProcessor.process(CamelInternalProcessor.java:172)
at org.apache.camel.processor.DelegateAsyncProcessor.process(DelegateAsyncProcessor.java:97)
at org.apache.camel.processor.CamelInternalProcessor.process(CamelInternalProcessor.java:172)
at org.apache.camel.component.direct.DirectProducer.process(DirectProducer.java:62)
at org.apache.camel.processor.SendProcessor.process(SendProcessor.java:145)
at org.apache.camel.management.InstrumentationProcessor.process(InstrumentationProcessor.java:77)
at org.apache.camel.processor.interceptor.TraceInterceptor.process(TraceInterceptor.java:163)
at org.apache.camel.processor.RedeliveryErrorHandler.process(RedeliveryErrorHandler.java:542)
at org.apache.camel.spring.spi.TransactionErrorHandler.processByErrorHandler(TransactionErrorHandler.java:220)
at org.apache.camel.spring.spi.TransactionErrorHandler.doInTransactionWithoutResult(TransactionErrorHandler.java:183)
at org.springframework.transaction.support.TransactionCallbackWithoutResult.doInTransaction(TransactionCallbackWithoutResult.java:34)
at org.springframework.transaction.support.TransactionTemplate.execute(TransactionTemplate.java:133)
at org.apache.camel.spring.spi.TransactionErrorHandler.doInTransactionTemplate(TransactionErrorHandler.java:176)
at org.apache.camel.spring.spi.TransactionErrorHandler.processInTransaction(TransactionErrorHandler.java:136)
at org.apache.camel.spring.spi.TransactionErrorHandler.process(TransactionErrorHandler.java:105)
at org.apache.camel.spring.spi.TransactionErrorHandler.process(TransactionErrorHandler.java:114)
at org.apache.camel.processor.CamelInternalProcessor.process(CamelInternalProcessor.java:197)
at org.apache.camel.processor.Pipeline.process(Pipeline.java:120)
at org.apache.camel.processor.Pipeline.process(Pipeline.java:83)
at org.apache.camel.processor.CamelInternalProcessor.process(CamelInternalProcessor.java:197)
at org.apache.camel.processor.DelegateAsyncProcessor.process(DelegateAsyncProcessor.java:97)
at org.apache.camel.component.jms.EndpointMessageListener.onMessage(EndpointMessageListener.java:112)
at org.springframework.jms.listener.AbstractMessageListenerContainer.doInvokeListener(AbstractMessageListenerContainer.java:721)
at org.springframework.jms.listener.AbstractMessageListenerContainer.invokeListener(AbstractMessageListenerContainer.java:681)
at org.springframework.jms.listener.AbstractMessageListenerContainer.doExecuteListener(AbstractMessageListenerContainer.java:651)
at org.springframework.jms.listener.AbstractPollingMessageListenerContainer.doReceiveAndExecute(AbstractPollingMessageListenerContainer.java:317)
at org.springframework.jms.listener.AbstractPollingMessageListenerContainer.receiveAndExecute(AbstractPollingMessageListenerContainer.java:235)
at org.springframework.jms.listener.DefaultMessageListenerContainer$AsyncMessageListenerInvoker.invokeListener(DefaultMessageListenerContainer.java:1166)
at org.springframework.jms.listener.DefaultMessageListenerContainer$AsyncMessageListenerInvoker.run(DefaultMessageListenerContainer.java:1060)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Locked ownable synchronizers:
- <0x00000006d0b19600> (a java.util.concurrent.ThreadPoolExecutor$Worker)

你需要弄清楚哪个线程拥有ReentrantLock < 0x00000006d75964c0>;它不会显示在堆栈跟踪中。

这可能是 IBM 客户端中的死锁 - 线程无法解锁它或者它们有锁定顺序问题。

您可能会查找在

上尝试同步时被阻止的线程
- waiting to lock <0x00000006ca146118> (a com.ibm.msg.client.jms.internal.State)

可能是该线程已经获取了 ReentrantLock。如果不存在这样的线程,那么很可能是前一种情况(解锁失败)。

无论如何,死锁似乎是在 IBM 代码中。很难调试,因为(我最后一次查看)MQ 客户端是闭源的。您可能需要向 IBM 开票;假设你在那里有支持。