是否修复了 Gridgain 集群中打开文件过多的错误?

Is there a fix for too many open files error in Gridgain cluster?

Gridgain 集群失败,出现错误“打开的文件太多”。我已经按照 GridGain 的建议设置了 ulimit 值。即 nofile --> 65536.

即使在将 nofile 增加到 65536 之后,集群仍在崩溃。有什么要检查或配置的吗?

[15:29:12,957][SEVERE][nio-acceptor-client-listener-#80][ClientListenerProcessor] Failed to accept remote connection (will wait for 2000ms).
class org.apache.ignite.IgniteCheckedException: Failed to accept connection: GridWorker [name=nio-acceptor-client-listener, igniteInstanceName=null, finished=false, heartbeatTs=1651073352943, hashCode=804329932, interrupted=false, runner=nio-acceptor-client-listener-#80]
    at org.apache.ignite.internal.util.nio.GridNioServer$GridNioAcceptWorker.accept(GridNioServer.java:3081)
    at org.apache.ignite.internal.util.nio.GridNioServer$GridNioAcceptWorker.body(GridNioServer.java:3002)
    at org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:119)
    at java.lang.Thread.run(Thread.java:750)
Caused by: java.io.IOException: Too many open files
    at sun.nio.ch.ServerSocketChannelImpl.accept0(Native Method)
    at sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:421)
    at sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:249)
    at org.apache.ignite.internal.util.nio.GridNioServer$GridNioAcceptWorker.processSelectedKeys(GridNioServer.java:3131)
    at org.apache.ignite.internal.util.nio.GridNioServer$GridNioAcceptWorker.accept(GridNioServer.java:3060)
    ... 3 more
    ```

这里(至少)有两种可能性。

  1. 文件描述符的数量没有增加。它可以在多个级别(系统、您的用户)进行设置,如果您在错误的地方进行设置,则不会有任何效果。你不说你是怎么做到的,所以我就把你指向这篇文章:https://www.cyberciti.biz/faq/linux-increase-the-maximum-number-of-open-files/
  2. 如果你有少量的节点和大量的caches/tables,你可能需要超过2^16个文件描述符。如果是这种情况,您需要进一步增加限制