Cassandra Cluster - 种子提供商如何运作？

Question

我对 cassandra seed_provider 作业有疑问。在我的环境中，需要 3 个 cassandra 节点才能设置为集群。 cassandra.yaml中应该怎么定义呢？我很困惑，因为大多数教程都给出了不同的答案。

示例：主机 A - 192.168.1.1 主机 B - 192.168.1.2 主机 C - 192.168.1.3

以下是我目前对主机 A 的设置，是否正确？

主机B和主机C的配置如何？

# any class that implements the SeedProvider interface and has a
# constructor that takes a Map<String, String> of parameters will do.
seed_provider:
    # Addresses of hosts that are deemed contact points. 
    # Cassandra nodes use this list of hosts to find each other and learn
    # the topology of the ring.  You must change this if you are running
    # multiple nodes!
    - class_name: org.apache.cassandra.locator.SimpleSeedProvider
      parameters:
          # seeds is actually a comma-delimited list of addresses.
          # Ex: "<ip1>,<ip2>,<ip3>"
          - seeds: "192.168.1.1,192.168.1.2,192.168.1.3"

Answer 1

对于初学者，您不需要更改 seed_provider 的 class_name。 AFAIK，只有一个随 Cassandra 一起提供。它被定义为 "pluggable," 以允许编写自定义种子提供程序。

对于seeds，我不建议指定种子列表中的每个节点。如果只有3个节点，那么只提供1个或2个即可。种子节点没有 bootstrap 数据，需要 repair 才能在替换时保持一致。这会使节点添加变得困难。

但据我所知，您当前的配置是可行的。我只会构建最多 2 个节点的种子列表。

请记住，seed_list 有两个主要要求：

如果您启动集群中的第一个节点，其 IP 必须在 seed_list.
至少有一个节点必须是运行。

Do you mind further explain on what's the impact if I proceed to add all 3 nodes in the seed list? What are the reasons that you will only choose to add 1 or 2 nodes in seed list?

当然，这一切都可以追溯到：

Seed nodes do not bootstrap data

因此，在所有 3 个节点上指定 seed_list 中的所有 3 个节点会导致以下问题：

如果在节点 B 或 C 加入集群之前节点 A 已启动并向其写入数据，则该数据将不会流到节点 B 或 C.

如果将来节点 A 出现故障并被替换，数据将不会流式传输到替换节点。

在这些情况下，nodetool repair 需要运行将初始数据发送到新添加的节点。

Cassandra Cluster - 种子提供商如何运作？

How Cassandra Cluster - Seed Provider Works?

database

rhel

cassandra

cassandra-2.0