Evaluation - OutOfMemoryError: unable to create new native thread

Evaluation - OutOfMemoryError: unable to create new native thread

尽管这是各种 Whosebug 用户报告的典型错误消息,但我的问题与如何评估所提出的解决方案是否解决了问题有关。

我阅读了与此错误相关的各种讨论和文章,大多数解决方案都深入到 Linux ulimits,我想我也是如此。

我的 ulimit 值为:

STACK 10240k, CORE 0k, NPROC 1024, NOFILE 4096;

我想问题可能出在 NOPROC / NOFILE 太低(只有默认值)。

但是,我想知道是否有一种确切的方法可以确定根本原因,比如已超过 NOPROC 等,以及是否有一种方法可以准确评估当前正在使用的进程/文件句柄数;或者还有其他一些我应该关注的可以统计评估的问题吗?

仅供参考,发生此问题时,堆转储未启用,并且在错误点没有线程转储数据。

感谢您对评估和修复此问题的投入。

这是简短的堆栈跟踪:

Caused by: java.lang.OutOfMemoryError: unable to create new native thread
    at java.lang.Thread.start0(Native Method)
    at java.lang.Thread.start(Thread.java:714)

这是系统值:

OS:Red Hat Enterprise Linux Server release 6.3 (Santiago)
uname:Linux 2.6.32-279.el6.x86_64 #1 SMP Wed Jun 13 18:24:36 EDT 2012 x86_64
libc:glibc 2.12 NPTL 2.12
rlimit: STACK 10240k, CORE 0k, NPROC 1024, NOFILE 4096, AS infinity
load average:0.11 0.10 0.03
CPU:total 32 (8 cores per cpu, 2 threads per core) family 6 model 45 stepping 7, cmov, cx8, fxsr, mmx, sse, sse2, sse3, ssse3, sse4.1, sse4.2, popcnt, avx, aes, ht, tsc, tscinvbit, tscinv

/proc/meminfo:
MemTotal:       74206252 kB
MemFree:         2788244 kB
Buffers:         1042212 kB
Cached:         58454988 kB
SwapCached:         2860 kB
Active:         38242540 kB
Inactive:       29129604 kB

这是来自 JVM 崩溃报告的信息 - hs_err_pidxxxxx.log:

# There is insufficient memory for the Java Runtime Environment to continue.
# Cannot create GC thread. Out of system resources.
...
#  Out of Memory Error (gcTaskThread.cpp:46), pid=20396, tid=140365307795200

# JRE version:  (7.0_80-b15) (build )
# Java VM: Java HotSpot(TM) 64-Bit Server VM (24.80-b11 mixed mode linux-amd64 compressed oops)
# Failed to write core dump. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again

Current thread (0x00007fa95400a800):  JavaThread "Unknown thread" [_thread_in_vm, id=20458, stack(0x00007fa9583f5000,0x00007fa9584f6000)]
Stack: [0x00007fa9583f5000,0x00007fa9584f6000],  sp=0x00007fa9584f4540,  free space=1021k

Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
V  [libjvm.so+0x9a320a]  VMError::report_and_die()+0x2ea
V  [libjvm.so+0x498d3b]  report_vm_out_of_memory(char const*, int, unsigned long, char const*)+0x9b
V  [libjvm.so+0x55943a]  GCTaskThread::GCTaskThread(GCTaskManager*, unsigned int, unsigned int)+0x11a
V  [libjvm.so+0x5589b8]  GCTaskManager::initialize()+0x2b8
V  [libjvm.so+0x843438]  ParallelScavengeHeap::initialize()+0x6f8
V  [libjvm.so+0x97509a]  Universe::initialize_heap()+0xca
V  [libjvm.so+0x976269]  universe_init()+0x79
V  [libjvm.so+0x5b2f25]  init_globals()+0x65
V  [libjvm.so+0x95db4d]  Threads::create_vm(JavaVMInitArgs*, bool*)+0x1ed
V  [libjvm.so+0x63b2e4]  JNI_CreateJavaVM+0x74
C  [libjli.so+0x2f8e]  JavaMain+0x9e
Java Threads: ( => current thread )
Other Threads:
=>0x00007fa95400a800 (exited) JavaThread "Unknown thread" [_thread_in_vm, id=20458, stack(0x00007fa9583f5000,0x00007fa9584f6000)]
VM state:not at safepoint (not fully initialized)
VM Mutex/Monitor currently owned by a thread: None
GC Heap History (0 events):
No events
Deoptimization events (0 events):
No events
Internal exceptions (0 events):
No events
Events (0 events):
No events

I wanted to know if there is an exact way to identify the root cause say the NOPROC has been exceeded etc

JVM 与任何其他软件一样,最终必须通过系统调用与内核对话。要生成新线程,它必须使用 clone syscall which can return various error codes (documented in the man pages). You can use strace 来记录系统调用并查看它们的错误代码,这可以提供比 OOME 更细粒度的信息。