如何在 Vespa 中配置分片？

Question

我们想要设置 4 个节点的集群来托管数据。并且集群只托管一个索引，所以在所有 4 个节点中具有相似的数据类型。

我们的目标是让数据分片在节点上。假设有两个分片和两个副本。（总共 4 个节点来托管这 4 个数据分区）

文档模式为 "index"，全局模式为 "true"。

   <redundancy>2</redundancy>

   <nodes>
      <node hostalias="node1" distribution-key="0"/>
      <node hostalias="node2" distribution-key="1"/>
      <node hostalias="node3" distribution-key="2"/>
      <node hostalias="node4" distribution-key="3"/>
    </nodes>        

    <engine>
      <proton>
        <searchable-copies>2</searchable-copies>
        <flush-on-shutdown>true</flush-on-shutdown>
      </proton>
    </engine>

services.xml 中的以上配置是不允许的。它要求冗余至少与节点数相同，我们需要配置，

<redundancy>4</redundancy>

和

<searchable-copies>4</searchable-copies>

让它接受有效的配置。

这就是将所有 4 个节点配置为拥有所有数据，并且每个节点都包含数据副本。根据 http://docs.vespa.ai/documentation/content/data-placement.html - 我们需要 global=true。并注意到：

Note: The global documents feature is under development. It is currently only available for setups where all documents are already inherently on all nodes, i.e. N groups each containing a single node.

数据如何分片分布？我们可以让node1和node2有分布式数据，node3和node4可以有冗余2的副本吗？

Answer 1

感谢您的提问 - 我看到 global=true 的文档有点混乱。

在你的情况下，你想要分片，即将每个文档的 2 个副本分布在 4 个节点上（如果我错了请纠正我）。

global 通常用于 http://docs.vespa.ai/documentation/search-definitions.html#document-references 中的父文档 - 在您的情况下，您只有文档类型（我假设），因此没有父文档，所以不要使用全局

global 功能将在 4 个节点上分配 4 个副本（如果这是您想要的，请设置 redundancy=4）。但这里也不需要使用 global。

如何在 Vespa 中配置分片？

How to configure shards in Vespa?

vespa