Cassandra 驱动程序异常 "All host(s) tried for query failed" 每隔几个小时发生一次,没有任何解释

Cassandra driver exception "All host(s) tried for query failed" occurs every few hours without explanation

我的 Cassandra 集群(4 节点集群)有问题。 Cassandra版本为2.2.9,驱动版本为3.0.3.
几个小时后(~ 3 小时),我在驱动程序日志中看到以下问题:

  1. OutOfDirectMemoryError(偶尔发生,大部分时间没有影响)
  2. 没有与整数版本匹配的协议版本
  3. 未知响应操作码
  4. 心跳查询超时
  5. 所有尝试查询的主机均失败 --> 无法再查询 Cassandra

Cassandra 集群是健康的,当我重新启动应用程序时,一切都会再次运行几个小时。

日志片段:

First Time                       Count  Message
2017-11-11 19:03:03 +0100            51  [/??.???.??.??:????] preparing to open ? new connections, total = ???
2017-11-11 19:03:03 +0100            49  [/??.???.??.??:????] Connection[/??.???.??.??:????-???, inFlight=?, closed=false] Transport initialized, connection ready
2017-11-11 19:03:03 +0100            24  [/??.???.??.??:????] Connection[/??.???.??.??:????-???, inFlight=?, closed=true] closed, remaining = ???
2017-11-11 19:03:29 +0100             1  Defuncting Connection[/??.???.??.??:????-???, inFlight=???, closed=false] because: [/??.???.??.??:????] Unexpected exception triggered (io.netty.handler.codec.DecoderException: io.netty.util.internal.OutOfDirectMemoryError: failed to allocate ??????? byte(s) of direct memory (used: ???????, max: ????????))
2017-11-11 19:03:29 +0100            14  [/??.???.??.??:????] Connection[/??.???.??.??:????-???, inFlight=???, closed=false] failed, remaining = ???
2017-11-11 19:03:29 +0100             7  [/??.???.??.??:????] Connection[/??.???.??.??:????-???, inFlight=??, closed=false] failed, remaining = ???
2017-11-11 19:03:29 +0100             1  Defuncting Connection[/??.???.??.??:????-???, inFlight=??, closed=false] because: [/??.???.??.??:????] Unexpected exception triggered (io.netty.handler.codec.DecoderException: java.lang.IllegalArgumentException: No protocol version matching integer version ?)
2017-11-11 19:03:29 +0100             5  Defuncting Connection[/??.???.??.??:????-???, inFlight=??, closed=false] because: [/??.???.??.??:????] Unexpected exception triggered (io.netty.handler.codec.DecoderException: com.datastax.driver.core.exceptions.DriverInternalError: Unknown response opcode ??)
2017-11-11 19:03:29 +0100             4  Defuncting Connection[/??.???.??.??:????-???, inFlight=???, closed=false] because: [/??.???.??.??:????] Unexpected exception triggered (io.netty.handler.codec.DecoderException: com.datastax.driver.core.exceptions.DriverInternalError: Unknown response opcode ?)
2017-11-11 19:03:29 +0100             3  Defuncting Connection[/??.???.??.??:????-???, inFlight=???, closed=false] because: [/??.???.??.??:????] Unexpected exception triggered (io.netty.handler.codec.DecoderException: com.datastax.driver.core.exceptions.DriverInternalError: Unknown response opcode -???)
2017-11-11 19:03:30 +0100             3  Defuncting Connection[/??.???.??.??:????-???, inFlight=???, closed=false] because: [/??.???.??.??:????] Unexpected exception triggered (io.netty.handler.codec.DecoderException: com.datastax.driver.core.exceptions.DriverInternalError: Unknown response opcode ??)
2017-11-11 19:03:30 +0100             2  Defuncting Connection[/??.???.??.??:????-???, inFlight=?, closed=false] because: [/??.???.??.??:????] Unexpected exception triggered (io.netty.handler.codec.DecoderException: com.datastax.driver.core.exceptions.DriverInternalError: Unknown response opcode ??)
2017-11-11 19:03:30 +0100           401  [/??.???.??.??:????] Connection[/??.???.??.??:????-???, inFlight=?, closed=false] failed, remaining = ???
2017-11-11 19:03:33 +0100             1  Defuncting Connection[/??.???.??.??:????-???, inFlight=???, closed=false] because: [/??.???.??.??:????] Unexpected exception triggered (io.netty.handler.codec.DecoderException: com.datastax.driver.core.exceptions.DriverInternalError: Unknown response opcode ???)
2017-11-11 19:03:41 +0100           722  Defuncting Connection[/??.???.??.??:????-???, inFlight=?, closed=false] because: [/??.???.??.??:????] Heartbeat query timed out
2017-11-11 19:03:41 +0100             8  [/??.???.??.??:????] Connection[/??.???.??.??:????-?, inFlight=?, closed=false] failed, remaining = ???
2017-11-11 19:03:41 +0100            11  Defuncting Connection[/??.???.??.??:????-?, inFlight=?, closed=false] because: [/??.???.??.??:????] Heartbeat query timed out
2017-11-11 19:03:41 +0100            67  [/??.???.??.??:????] Connection[/??.???.??.??:????-??, inFlight=?, closed=false] failed, remaining = ???
2017-11-11 19:03:41 +0100           115  Defuncting Connection[/??.???.??.??:????-??, inFlight=?, closed=false] because: [/??.???.??.??:????] Heartbeat query timed out
2017-11-11 19:03:44 +0100             2  Defuncting Connection[/??.???.??.??:????-???, inFlight=??, closed=false] because: [/??.???.??.??:????] Unexpected exception triggered (io.netty.handler.codec.DecoderException: com.datastax.driver.core.exceptions.DriverInternalError: Unknown response opcode ?)
2017-11-11 19:03:51 +0100             2  Defuncting Connection[/??.???.??.??:????-???, inFlight=???, closed=false] because: [/??.???.??.??:????] Heartbeat query timed out
2017-11-11 19:03:51 +0100           265  Failed to post timeseries data Error Returned - 
2017-11-11 19:03:57 +0100             3  Defuncting Connection[/??.???.??.??:????-???, inFlight=??, closed=false] because: [/??.???.??.??:????] Heartbeat query timed out
2017-11-11 19:04:01 +0100            39  Defuncting Connection[/??.???.??.??:????-???, inFlight=?, closed=false] because: [/??.???.??.??:????] Operation timed out
2017-11-11 19:04:01 +0100            12  Error processing jobs: execution of statement failed:All host(s) tried for query failed (tried: /??.???.??.??:???? (com.datastax.driver.core.exceptions.DriverException: Timeout while trying to acquire available connection (you may want to increase the driver number of per-host connections)), /??.???.??.??:???? (com.datastax.driver.core.exceptions.DriverException: Timeout while trying to acquire available connection (you may want to increase the driver number of per-host connections)), /??.???.??.??:???? (com.datastax.driver.core.exceptions.DriverException: Timeout while trying to acquire available connection (you may want to increase the driver number of per-host connections)), /??.???.??.??:???? [only showing errors of first ? hosts, use getErrors() for more details])

有人知道根本原因是什么吗?

Defuncting Connection[/??.???.??.??:????-???, inFlight=???, closed=false] because: [/??.???.??.??:????] Unexpected exception triggered (io.netty.handler.codec.DecoderException: io.netty.util.internal.OutOfDirectMemoryError: failed to allocate ??????? byte(s) of direct memory (used: ???????, max: ????????))

你有一些记忆问题。只要存在这些问题,您就不能指望驱动程序能够正常工作。您还说您的应用程序在几个小时后停止工作。在我看来,您的应用程序似乎存在内存泄漏。

请检查您的应用程序使用的直接内存设置。确保有足够的内存可供驱动程序分配。 Cassandra 需要分配直接内存。在它无法分配内存的情况下,我已经看到类似的问题,即使它与内存相关,它也会被报告为 NoHostAvailableException。