Datastax 代理偶尔无法报告指标

Question

我是运行 DSE 4.6.5 集群 (Cassandra 2.0.14.352) 和 OpsCenter 5.1.1

一天一到两次，其中一个节点（有时更多）停止报告指标，直到我手动重新启动 datastax-agent。

在我重新启动代理之前，它是活动的。这是代理日志：

WARN [Thread-13] 2015-04-14 23:20:23,277 Cassandra operation queue is full, discarding cassandra operation
WARN [Thread-13] 2015-04-14 23:20:23,277 131176 operations dropped so far.
WARN [Thread-13] 2015-04-14 23:20:23,277 Cassandra operation queue is full, discarding cassandra operation
WARN [Thread-13] 2015-04-14 23:20:23,277 131177 operations dropped so far.
WARN [Thread-13] 2015-04-14 23:20:23,278 Cassandra operation queue is full, discarding cassandra operation
WARN [Thread-13] 2015-04-14 23:20:23,278 131178 operations dropped so far.
WARN [Thread-13] 2015-04-14 23:20:23,278 Cassandra operation queue is full, discarding cassandra operation
WARN [Thread-13] 2015-04-14 23:20:23,278 131179 operations dropped so far.
WARN [Thread-13] 2015-04-14 23:20:23,278 Cassandra operation queue is full, discarding cassandra operation
WARN [Thread-13] 2015-04-14 23:20:23,278 131180 operations dropped so far.
ERROR [cassandra-processor-1] 2015-04-14 23:20:24,387 Error when proccessing cassandra callcom.datastax.driver.core.exceptions.NoHostAvailableException: All host(s) tried for query failed (no host was tried)

请注意：

所有节点都在同一个数据中心，具有相同的硬件规格和配置相同。
节点使用两个 NIC，因此 rpc_address 和 listen_address 在不同的网络上
OpsCenter 运行在其中一个集群节点上
写的很密集：请查看我的其他

总而言之，在其中一台机器上（以循环方式），代理停止报告数据，而在另一台机器上工作正常。重新启动代理服务可以解决问题，但它不应该自行重启吗？这是一个错误吗？我该如何解决这个问题？

如果您需要更多信息，请告诉我。谢谢

Answer 1

我也见过同样的事情。您可以尝试两件事。

1) 排除或限制您从中收集指标的 keyspaces/CF。 http://docs.datastax.com/en/opscenter/5.1/opsc/configure/opscControllingDataCollection_c.html?scroll=concept_ds_jlq_xk4_gk

2) 运行 Opscenter 在一个单独的集群上（例如与主集群分开的一个或两个节点的小型集群）。 http://www.datastax.com/dev/blog/storing-opscenter-data-in-a-separate-cluster

老实说，选项 2 是更明智的做法，您不需要大型节点，如果您在主集群上收集指标并且该集群崩溃，那么您运行是盲目的。

Datastax 代理偶尔无法报告指标

Datastax agent failing to report metrics once in a while

datastax-enterprise

opscenter

datastax