MongoDB ops manager "java.lang.OutOfMemoryError: unable to create native thread"

MongoDB ops manager "java.lang.OutOfMemoryError: unable to create native thread"

我目前正在设置一个新的 MongoDB 操作管理器机器。安装工作正常,但我无法启动 mongodb-mms 服务。实例 0 的启动失败并出现 java.lang.OutOfMemoryError 异常。我使用与我的测试服务器相同的配置(2 CPU 内核,8GB 内存),服务在没有任何中断的情况下启动。 更改 ulimit 配置/以 root 用户启动服务均无效。

新服务器规格:

由于新服务器是与其他人共享的,因此主机是否限制了每个用户的 cpu 使用量?

mms0.log:

[Starting Logging - App Version: 4.2.23.57072.20210126T1748Z]
2021-03-28T19:32:11.682+0000 [main] INFO  com.xgen.svc.mms.dao.mongo.MongoSvcUriImpl [MongoSvcUriImpl.java.initMorphiaMapper:154] - Initialized Morphia in 12538ms
2021-03-28T19:32:12.319+0000 [main] INFO  com.xgen.svc.mms.dao.mongo.MongoSvcUriImpl [MongoSvcUriImpl.java.<init>:89] - Created MongoSvc with 1 client(s)
[Starting Logging - App Version: 4.2.23.57072.20210126T1748Z]
2021-03-28T19:33:07.998+0000 [main] INFO  com.xgen.svc.core.ServerMain [ServerMain.java.doPreFlightCheck:295] - Starting pre-flight checks
2021-03-28T19:33:20.990+0000 [main] INFO  com.xgen.svc.mms.dao.mongo.MongoSvcUriImpl [MongoSvcUriImpl.java.initMorphiaMapper:154] - Initialized Morphia in 12920ms
2021-03-28T19:33:21.555+0000 [main] INFO  com.xgen.svc.mms.dao.mongo.MongoSvcUriImpl [MongoSvcUriImpl.java.<init>:89] - Created MongoSvc with 1 client(s)
2021-03-28T19:33:22.983+0000 [main] INFO  com.xgen.svc.core.ServerMain [ServerMain.java.doPreFlightCheck:301] - Successfully finished pre-flight checks
2021-03-28T19:33:22.984+0000 [main] INFO  com.xgen.svc.core.ServerMain [ServerMain.java.start:308] - Starting mms...
2021-03-28T19:33:23.142+0000 [main] INFO  com.xgen.svc.core.ServerMain [ServerMain.java.createNonSSLConnector:843] - Creating HTTP listener on *:8080
2021-03-28T19:33:23.438+0000 [main] ERROR com.xgen.svc.core.ServerMain [ServerMain.java.main:226] - Cannot start mms server [FATAL-EXITING] - instance: 0  - msg: unable to create native thread: possibly out of memory or process/resource limits reached
java.lang.OutOfMemoryError: unable to create native thread: possibly out of memory or process/resource limits reached
        at java.base/java.lang.Thread.start0(Native Method)
        at java.base/java.lang.Thread.start(Thread.java:803)
        at org.eclipse.jetty.util.thread.QueuedThreadPool.startThread(QueuedThreadPool.java:660)
        at org.eclipse.jetty.util.thread.QueuedThreadPool.ensureThreads(QueuedThreadPool.java:642)
        at org.eclipse.jetty.util.thread.QueuedThreadPool.doStart(QueuedThreadPool.java:182)
        at org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:73)
        at org.eclipse.jetty.util.component.ContainerLifeCycle.start(ContainerLifeCycle.java:169)
        at org.eclipse.jetty.server.Server.start(Server.java:423)
        at org.eclipse.jetty.util.component.ContainerLifeCycle.doStart(ContainerLifeCycle.java:117)
        at org.eclipse.jetty.server.handler.AbstractHandler.doStart(AbstractHandler.java:97)
        at org.eclipse.jetty.server.Server.doStart(Server.java:387)
        at org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:73)
        at com.xgen.svc.core.ServerMain.start(ServerMain.java:424)
        at com.xgen.svc.core.ServerMain.main(ServerMain.java:221)

mms0-startup.log

[23,180s][warning][os,thread] Failed to start thread - pthread_create failed (EAGAIN) for attributes: stacksize: 512k, guardsize: 0k, detached.
OpenJDK 64-Bit Server VM warning: Option UseConcMarkSweepGC was deprecated in version 9.0 and will likely be removed in a future release.
[19,947s][warning][os,thread] Failed to start thread - pthread_create failed (EAGAIN) for attributes: stacksize: 512k, guardsize: 0k, detached.
Cannot start mms server [FATAL-EXITING] - instance: 0  - msg: unable to create native thread: possibly out of memory or process/resource limits                                                                                              reached
java.lang.OutOfMemoryError: unable to create native thread: possibly out of memory or process/resource limits reached
        at java.base/java.lang.Thread.start0(Native Method)
        at java.base/java.lang.Thread.start(Thread.java:803)
        at org.eclipse.jetty.util.thread.QueuedThreadPool.startThread(QueuedThreadPool.java:660)
        at org.eclipse.jetty.util.thread.QueuedThreadPool.ensureThreads(QueuedThreadPool.java:642)
        at org.eclipse.jetty.util.thread.QueuedThreadPool.doStart(QueuedThreadPool.java:182)
        at org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:73)
        at org.eclipse.jetty.util.component.ContainerLifeCycle.start(ContainerLifeCycle.java:169)
        at org.eclipse.jetty.server.Server.start(Server.java:423)
        at org.eclipse.jetty.util.component.ContainerLifeCycle.doStart(ContainerLifeCycle.java:117)
        at org.eclipse.jetty.server.handler.AbstractHandler.doStart(AbstractHandler.java:97)
        at org.eclipse.jetty.server.Server.doStart(Server.java:387)
        at org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:73)
        at com.xgen.svc.core.ServerMain.start(ServerMain.java:424)
        at com.xgen.svc.core.ServerMain.main(ServerMain.java:221)

ulimit -a

core file size          (blocks, -c) 0
data seg size           (kbytes, -d) unlimited
scheduling priority             (-e) 0
file size               (blocks, -f) unlimited
pending signals                 (-i) 1544321
max locked memory       (kbytes, -l) 65536
max memory size         (kbytes, -m) unlimited
open files                      (-n) 1024
pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) 819200
real-time priority              (-r) 0
stack size              (kbytes, -s) 8192
cpu time               (seconds, -t) unlimited
max user processes              (-u) 62987
virtual memory          (kbytes, -v) unlimited
file locks                      (-x) unlimited

建议:专注于您的 JVM;

  • 确保您有 64 位版本的 Java
  • 尝试调整您的 JVM 参数:

https://docs.opsmanager.mongodb.com/current/reference/troubleshooting/system/

  1. Open mms.conf in your preferred text editor.

  2. Find this line:

    JAVA_MMS_UI_OPTS="${JAVA_MMS_UI_OPTS} -Xss228k -Xmx4352m -Xms4352m -XX:NewSize=600m -Xmn1500m -XX:ReservedCodeCacheSize=128m -XX:-OmitStackTraceInFastThrow"
    
  3. Change the -Xmx and -Xms values to a larger value. Both parameters should be set to the same value to remove any performance impact from the VM constantly reclaiming memory from the heap.

The value is specified as #k|m|g: a number followed by

k (kilobytes), m (megabytes), or g (gigabytes)

By default, Xmsx and Xms are both set to 4,352 MB (4352m).

EXAMPLE: To set the Java heap to 10 GB, set this value to:

-Xmx10g -Xms10g

强烈建议:我会继续关注 JVM 设置,但是,这个 link 可能也相关:

I encountered a similar issue in our Test Ops Manager deployment when we upgraded to Ops Manager 1.8.0. I ultimately opened up a ticket with MongoDB Support and this was the resolution for our issue:

The Ops Manager components are launched using the default username "mongodb-mms". Please adjust the ulimit settings for this user to match those of the "mongodb" user, currently defined in /etc/security/limits.d/99-mongodb-mms-automation-agent.conf.

You may wish to add a separate file under /etc/security/limits.d/ for the mongodb-mms user.

More information can be found here.


新信息:

So I tried a fresh install with the same version of MongoDB (4.4.3) and Ops Manager(4.4.8.100) to check if something was wrong with the newest versions. Throws the same error.

I tried running jconsole -debug ->

[1,323s][warning][os,thread] Failed to start thread - pthread_create failed (EAGAIN) for attributes: stacksize: 1024k, guardsize: 0k, detached

这表明您可能 运行 线程不足。

相关links:

https://github.com/elastic/elasticsearch/issues/31982

Elasticsearch version (bin/elasticsearch --version): 6.3.1

JVM version (java -version):10

OS version:centos

java.lang.OutOfMemoryError: unable to create native thread: possibly out of memory or process/resource limits reached but my os has free 80g memory

i used docker.elastic.co/elasticsearch/elasticsearch:6.3.1,

jvm config:
-Xms32g
-Xmx32g

...

[I had] a similar (but likely unrelated) issue in our app (which is using the ES client). For whatever reason, it had gone berserk during the weekend, spawning 9400 threads which made the machine fail in new thread creation for the same user account.

ps -o nlwp,pid -fe helped me spot this, so I could kill the bad process and get the system back to a usable state. Greatly appreciated!

这是我的 Ubuntu 系统(AWS 虚拟机)中的示例 ps -o nlwp,pid -fe。我怀疑您的“ps”看起来会非常非常不同:

# ps -o nlwp,pid -fe
NLWP   PID
   1 13409
   1 13410
   1 13418
   1   915
   1   911

附录:

I switched the OS (from Ubuntu 18.04 LTS 64bit) to CentOS 8 and now its working perfectly.