LocalNodeFirstLoadBalancingPolicy 也从 local_dc 以外的数据中心添加节点

LocalNodeFirstLoadBalancingPolicy adds nodes from datacenters other than local_dc also

在LocalNodeFirstLoadBalancingPolicy的documentation中提到-

Selects local node first and then nodes in local DC in random order. Never selects nodes from other DCs. For writes, if a statement has a routing key set, this LBP is token aware - it prefers the nodes which are replicas of the computed token to the other nodes.

但是在我的 spark 作业日志中,我可以找到所有节点都是正在添加的集群。

21/05/05 10:08:40 INFO CassandraWriter$: Setting local_dc: DC1
21/05/05 10:08:40 INFO CassandraWriter$: Writing to DC: DC1, available host ips: x.x.x.54,x.x.x.237,x.x.x.168,x.x.x.197,x.x.x.219
21/05/05 10:08:41 INFO Cluster: New Cassandra host /x.x.x.219:9042 added
21/05/05 10:08:41 INFO Cluster: New Cassandra host /x.x.x.237:9042 added
21/05/05 10:08:41 INFO Cluster: New Cassandra host /x.x.x.54:9042 added
21/05/05 10:08:41 INFO Cluster: New Cassandra host /x.x.x.238:9042 added
21/05/05 10:08:41 INFO LocalNodeFirstLoadBalancingPolicy: Added host x.x.x.238 (DC2)
21/05/05 10:08:41 INFO Cluster: New Cassandra host /x.x.x.168:9042 added
21/05/05 10:08:41 INFO Cluster: New Cassandra host /x.x.x.42:9042 added
21/05/05 10:08:41 INFO LocalNodeFirstLoadBalancingPolicy: Added host x.x.x.42 (DC2)
21/05/05 10:08:41 INFO Cluster: New Cassandra host /x.x.x.109:9042 added
21/05/05 10:08:41 INFO LocalNodeFirstLoadBalancingPolicy: Added host x.x.x.109 (DC2)

有人可以帮助我理解 为什么要添加 DC2 节点吗?根据我的理解,协调器节点总是从 local_dc.

中选择

我也尝试过在不设置 spark.cassandra.connection.local_dc 的情况下 运行 摄取,并且看到了相同的日志。

见下方编写代码:

records.write.cassandraFormat(table, keySpace)
  .mode(SaveMode.Append)
  .option(CassandraConnectorConf.LocalDCParam.name, cassandraDC.name)
  .option(CassandraConnectorConf.ConnectionHostParam.name, cassandraDC.availableHosts.mkString(","))
  .save()

PS:我有单独的 spark 和 cassandra 集群,我的用例是将数据从 spark 集群写入 cassandra。

您可以忽略这些消息。这就是 Cassandra 的工作方式 - 驱动程序在初始化时发现集群的完整拓扑,然后决定仅使用给定数据中心的特定节点。

例如,像 New Cassandra host /x.x.x.54:9042 added 这样的消息是 coming from Java driver. And messages like Added host x.x.x.238 (DC2) are coming from LocalNodeFirstLoadBalancingPolicy that must override the function in the interface. But then, load balancing policy doesn't use the nodes that aren't in the local data center,尽管始终保留所有节点的映射。