Tomcat 100% CPU 使用率

Tomcat 100% CPU Usage

我尝试了所有方法来找到原因,但不确定发生了什么,尝试了 jProfile,但没有得到任何结果,因为 cpu 已经达到 100%。

I am running a Spring application (WAR file) on amazon linux on Tomcat8. no database operations but yes it access a rest api for different operations running in separate environment. There are 3 servers have same issue running behind a load balancer in a beanstalk. with average of 518K requests per hour and average latency of 72.2 milliseconds

感觉代码本身问题不大

我刚刚尝试了一个命令 kill -3 pid(java 进程),并在 catalina.out 文件中得到了一些结果。我可以一次又一次地看到以下错误

"http-nio-8080-exec-13" #42 daemon prio=5 os_prio=0 tid=0x00007f0898005800 nid=0xfb2 waiting on condition [0x00007f0882dec000]
   java.lang.Thread.State: TIMED_WAITING (parking)
    at sun.misc.Unsafe.park(Native Method)
    - parking to wait for  <0x00000000f1694f58> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
    at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
    at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078)
    at java.util.concurrent.LinkedBlockingQueue.poll(LinkedBlockingQueue.java:467)
    at org.apache.tomcat.util.threads.TaskQueue.poll(TaskQueue.java:85)
    at org.apache.tomcat.util.threads.TaskQueue.poll(TaskQueue.java:31)
    at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1066)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1127)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61)
    at java.lang.Thread.run(Thread.java:745)

"http-nio-8080-exec-18" #41 daemon prio=5 os_prio=0 tid=0x00007f088c024000 nid=0xfb1 waiting on condition [0x00007f0882eed000]
   java.lang.Thread.State: TIMED_WAITING (parking)
    at sun.misc.Unsafe.park(Native Method)
    - parking to wait for  <0x00000000f1694f58> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
    at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
    at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078)
    at java.util.concurrent.LinkedBlockingQueue.poll(LinkedBlockingQueue.java:467)
    at org.apache.tomcat.util.threads.TaskQueue.poll(TaskQueue.java:85)
    at org.apache.tomcat.util.threads.TaskQueue.poll(TaskQueue.java:31)
    at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1066)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1127)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61)
    at java.lang.Thread.run(Thread.java:745)

最后是以下信息

"VM Thread" os_prio=0 tid=0x00007f08d8081000 nid=0xf8b runnable 

"GC task thread#0 (ParallelGC)" os_prio=0 tid=0x00007f08d801e800 nid=0xf89 runnable 

"GC task thread#1 (ParallelGC)" os_prio=0 tid=0x00007f08d8020800 nid=0xf8a runnable 

"VM Periodic Task Thread" os_prio=0 tid=0x00007f08d80f0000 nid=0xf92 waiting on condition 

JNI global references: 478

Heap
 PSYoungGen      total 85504K, used 28368K [0x00000000fab00000, 0x0000000100000000, 0x0000000100000000)
  eden space 83968K, 33% used [0x00000000fab00000,0x00000000fc61cd08,0x00000000ffd00000)
  from space 1536K, 39% used [0x00000000ffd00000,0x00000000ffd97340,0x00000000ffe80000)
  to   space 1536K, 0% used [0x00000000ffe80000,0x00000000ffe80000,0x0000000100000000)
 ParOldGen       total 175104K, used 107142K [0x00000000f0000000, 0x00000000fab00000, 0x00000000fab00000)
  object space 175104K, 61% used [0x00000000f0000000,0x00000000f68a1a48,0x00000000fab00000)
 Metaspace       used 56563K, capacity 60632K, committed 60800K, reserved 1103872K
  class space    used 5218K, capacity 5736K, committed 5760K, reserved 1048576K

有人可以解释一下这里发生了什么吗?

Ok here are some of the logs I pulled from the log which are in running state

(日志中有几次)

"ajp-nio-8009-Acceptor-0" #23 daemon prio=5 os_prio=0 tid=0x00007f08d8535800 nid=0xfa0 runnable [0x00007f0883ffe000]
   java.lang.Thread.State: RUNNABLE
    at sun.nio.ch.ServerSocketChannelImpl.accept0(Native Method)
    at sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:422)
    at sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:250)
    - locked <0x00000000f0881618> (a java.lang.Object)
    at org.apache.tomcat.util.net.NioEndpoint$Acceptor.run(NioEndpoint.java:682)
    at java.lang.Thread.run(Thread.java:745)

"ajp-nio-8009-ClientPoller-1" #22 daemon prio=5 os_prio=0 tid=0x00007f08d8534000 nid=0xf9f runnable [0x00007f08a83b2000]
   java.lang.Thread.State: RUNNABLE
    at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
    at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269)
    at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:93)
    at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:86)
    - locked <0x00000000f16a8100> (a sun.nio.ch.Util)
    - locked <0x00000000f16a80f0> (a java.util.Collections$UnmodifiableSet)
    - locked <0x00000000f16a7fc8> (a sun.nio.ch.EPollSelectorImpl)
    at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:97)
    at org.apache.tomcat.util.net.NioEndpoint$Poller.run(NioEndpoint.java:1034)
    at java.lang.Thread.run(Thread.java:745)

"ajp-nio-8009-ClientPoller-0" #21 daemon prio=5 os_prio=0 tid=0x00007f08d82f8800 nid=0xf9e runnable [0x00007f08a84b3000]
   java.lang.Thread.State: RUNNABLE
    at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
    at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269)
    at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:93)
    at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:86)
    - locked <0x00000000f16a8900> (a sun.nio.ch.Util)
    - locked <0x00000000f16a88f0> (a java.util.Collections$UnmodifiableSet)
    - locked <0x00000000f16a87c8> (a sun.nio.ch.EPollSelectorImpl)
    at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:97)
    at org.apache.tomcat.util.net.NioEndpoint$Poller.run(NioEndpoint.java:1034)
    at java.lang.Thread.run(Thread.java:745)

"http-nio-8080-Acceptor-0" #20 daemon prio=5 os_prio=0 tid=0x00007f08d82f7000 nid=0xf9d runnable [0x00007f08a85b4000]
   java.lang.Thread.State: RUNNABLE
    at sun.nio.ch.ServerSocketChannelImpl.accept0(Native Method)
    at sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:422)
    at sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:250)
    - locked <0x00000000f0882f68> (a java.lang.Object)
    at org.apache.tomcat.util.net.NioEndpoint$Acceptor.run(NioEndpoint.java:682)
    at java.lang.Thread.run(Thread.java:745)

"http-nio-8080-ClientPoller-1" #19 daemon prio=5 os_prio=0 tid=0x00007f08d82f5800 nid=0xf9c runnable [0x00007f08a86b5000]
   java.lang.Thread.State: RUNNABLE
    at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
    at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269)
    at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:93)
    at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:86)
    - locked <0x00000000f1677900> (a sun.nio.ch.Util)
    - locked <0x00000000f16778f0> (a java.util.Collections$UnmodifiableSet)
    - locked <0x00000000f16777a8> (a sun.nio.ch.EPollSelectorImpl)
    at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:97)
    at org.apache.tomcat.util.net.NioEndpoint$Poller.run(NioEndpoint.java:1034)
    at java.lang.Thread.run(Thread.java:745)

"http-nio-8080-ClientPoller-0" #18 daemon prio=5 os_prio=0 tid=0x00007f08d82f4000 nid=0xf9b runnable [0x00007f08a87b6000]
   java.lang.Thread.State: RUNNABLE
    at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
    at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269)
    at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:93)
    at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:86)
    - locked <0x00000000f1666290> (a sun.nio.ch.Util)
    - locked <0x00000000f1666280> (a java.util.Collections$UnmodifiableSet)
    - locked <0x00000000f1666138> (a sun.nio.ch.EPollSelectorImpl)
    at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:97)
    at org.apache.tomcat.util.net.NioEndpoint$Poller.run(NioEndpoint.java:1034)
    at java.lang.Thread.run(Thread.java:745)

还有一个不确定是否引起问题(但在日志中只有一次)

"http-nio-8080-exec-131" #160 daemon prio=5 os_prio=0 tid=0x00007f088c100800 nid=0x153e runnable [0x00007f08727e5000]
   java.lang.Thread.State: RUNNABLE
    at java.net.SocketInputStream.socketRead0(Native Method)
    at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
    at java.net.SocketInputStream.read(SocketInputStream.java:170)
    at java.net.SocketInputStream.read(SocketInputStream.java:141)
    at java.io.BufferedInputStream.fill(BufferedInputStream.java:246)
    at java.io.BufferedInputStream.read1(BufferedInputStream.java:286)
    at java.io.BufferedInputStream.read(BufferedInputStream.java:345)
    - locked <0x00000000f688f4d0> (a java.io.BufferedInputStream)
    at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:704)
    at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:647)
    at sun.net.www.protocol.http.HttpURLConnection.getInputStream0(HttpURLConnection.java:1536)
    - locked <0x00000000f6892258> (a sun.net.www.protocol.http.HttpURLConnection)
    at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1441)
    - locked <0x00000000f6892258> (a sun.net.www.protocol.http.HttpURLConnection)
    at sun.net.www.protocol.http.HttpURLConnection.getHeaderFields(HttpURLConnection.java:2966)
    at com.code.http.WebUtility.getUrlContents(WebUtility.java:163)

并且这一行的代码是 con.getHeaderFields(),其中 con 是 HttpURLConnection

try {
    con = (HttpURLConnection) url.opencon();
    con.setDoInput(true);
    con.setDoOutput(true);
    con.connect();
    writeJson (con);

    this.header = con.getHeaderFields();//this is the line in running state
    this.code = con.getResponseCode();
    return readSuccessStream(con);
} catch (IOException e) {
    if (con != null) {
        return readFailureStream(con);
    }
    return "a nasty error occured";
} finally {
    if (con != null) {
        con.disconnect();
    }
}

运行 州

"Service Thread" #7 daemon prio=9 os_prio=0 tid=0x00007f08d80db000 nid=0xf91 runnable [0x0000000000000000]
   java.lang.Thread.State: RUNNABLE

"C1 CompilerThread1" #6 daemon prio=9 os_prio=0 tid=0x00007f08d80c8800 nid=0xf90 waiting on condition [0x0000000000000000]
   java.lang.Thread.State: RUNNABLE

"C2 CompilerThread0" #5 daemon prio=9 os_prio=0 tid=0x00007f08d80bb000 nid=0xf8f waiting on condition [0x0000000000000000]
   java.lang.Thread.State: RUNNABLE

"Signal Dispatcher" #4 daemon prio=9 os_prio=0 tid=0x00007f08d80b9000 nid=0xf8e waiting on condition [0x0000000000000000]
   java.lang.Thread.State: RUNNABLE

最后一个

"main" #1 prio=5 os_prio=0 tid=0x00007f08d8009800 nid=0xf88 runnable [0x00007f08de871000]
   java.lang.Thread.State: RUNNABLE
    at java.net.PlainSocketImpl.socketAccept(Native Method)
    at java.net.AbstractPlainSocketImpl.accept(AbstractPlainSocketImpl.java:409)
    at java.net.ServerSocket.implAccept(ServerSocket.java:545)
    at java.net.ServerSocket.accept(ServerSocket.java:513)
    at org.apache.catalina.core.StandardServer.await(StandardServer.java:446)
    at org.apache.catalina.startup.Catalina.await(Catalina.java:717)
    at org.apache.catalina.startup.Catalina.start(Catalina.java:663)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.apache.catalina.startup.Bootstrap.start(Bootstrap.java:351)
    at org.apache.catalina.startup.Bootstrap.main(Bootstrap.java:485)

线程状态

Java 虚拟机中线程的状态是:

NEW
A thread that has not yet started is in this state.

RUNNABLE
A thread executing in the Java virtual machine is in this state.

BLOCKED
A thread that is blocked waiting for a monitor lock is in this state.

WAITING
A thread that is waiting indefinitely for another thread to perform a particular action is in this state.

TIMED_WAITING
A thread that is waiting for another thread to perform an action for up to a specified waiting time is in this state.

TERMINATED
A thread that has exited is in this state.

在 Java 中,线程表示为线程对象 (java.lang.Thread)。当你创建一个新线程时,字面意思是 new Thread(),你创建了一个线程,但它还没有做任何事情。一旦您调用 Thread.start(),它就会从 NEW 状态移动到 RUNNABLE 状态。 Runnable 意味着它有等待 CPU 执行的工作。虽然 运行,线程可以在 RUNNABLEBLOCKEDWAITINGTIMED_WAITING 之间移动。当线程退出时,它处于 TERMINATED 状态。

当尝试诊断 Java 进程的高 CPU 使用率时,我们可以排除 NEWTERMINATED 的线程,因为它们已经死了。 BLOCKEDWAITINGTIMED_WAITING 的线程未执行任何操作。从字面上看,他们在等待未来的某个事件。

这是什么意思?只有处于 RUNNABLE 状态的线程才能对 CPU.

施加负载

这些线程是什么?

Tomcat 使用线程池可以同时处理多个请求。

第一个线程是接受线程。它的目的是在网络套接字上侦听传入请求。但是为了同时处理多个请求,它需要委托工作。否则,每个请求都会锁定服务器直到完成,这意味着一次只有一个用户可以连接到 Web 服务器。所以这个线程简单地接受连接,然后将它们交给另一个线程,这样它就可以接受更多的连接。

"http-nio-8080-Acceptor-0" #20 daemon prio=5 os_prio=0 tid=0x00007f08d82f7000 nid=0xf9d runnable [0x00007f08a85b4000]
   java.lang.Thread.State: RUNNABLE
    at sun.nio.ch.ServerSocketChannelImpl.accept0(Native Method)
    at sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:422)
    at sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:250)
    - locked <0x00000000f0882f68> (a java.lang.Object)
    at org.apache.tomcat.util.net.NioEndpoint$Acceptor.run(NioEndpoint.java:682)
    at java.lang.Thread.run(Thread.java:745)

这些线程是 Tomcat 分配请求的工作线程:

"http-nio-8080-exec-13" #42 daemon prio=5 os_prio=0 tid=0x00007f0898005800 nid=0xfb2 waiting on condition [0x00007f0882dec000]
   java.lang.Thread.State: TIMED_WAITING (parking)
    at sun.misc.Unsafe.park(Native Method)
    - parking to wait for  <0x00000000f1694f58> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
    at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
    at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078)
    at java.util.concurrent.LinkedBlockingQueue.poll(LinkedBlockingQueue.java:467)
    at org.apache.tomcat.util.threads.TaskQueue.poll(TaskQueue.java:85)
    at org.apache.tomcat.util.threads.TaskQueue.poll(TaskQueue.java:31)
    at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1066)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1127)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61)
    at java.lang.Thread.run(Thread.java:745)

这意味着线程正在休眠。它正在等待接受器线程给它工作。它对您的 100% CPU 使用没有任何贡献。

结论

我在您的线程转储中看到的可运行线程表明,从网络套接字读取是消耗 CPU 负载的地方。我不能明确地说出为什么。有两个可能的原因:网络问题或远程系统非常潜伏读取是内存使用问题。

大多数时候,当我看到这个时,CPU 工作来自垃圾收集器,它在另一个线程中运行。当存在分配内存的压力并且几乎没有可用内存时,垃圾收集器必须在其他线程等待时做大量昂贵的工作来寻找可用内存。该线程可能在您显示的线程转储中不可见,因为它不是在 Java 代码中创建的线程,而是 Java 虚拟机的内部部分。我不能确定这是否是您的问题所在。这可能是网络问题或您正在读取的远程系统的问题。我建议仔细查看 每个 线程以尝试找到一个模式。