Elasticsearch 存储字段与 _source

Question

使用 Elasticsearch 1.4.3

我正在构建一种 "reporting" 系统。客户可以选择他们想要在结果中 return 的字段。

在 90% 的情况下，客户端永远不会选择所有字段，所以我想我可以在我的映射中禁用 _source 字段以保存 space。但后来我才知道

GET myIndex/myType/_search/
{
    "fields": ["field1", "field2"]
    ...
}

不return字段。

所以我假设我必须对每个字段使用 "store": true 。根据我的阅读，这对于搜索来说会更快，但我想 space 明智的是它将与 _source 相同，或者我们仍然保存 space?

Answer 1

启用 _source 会将整个 JSON 文档存储在索引中，而 store 将仅存储标记为如此的个别字段。因此，如果您想节省磁盘 space.

，使用 store 可能比使用 _source 更好

Answer 2

_source 字段存储您发送到 Elasticsearch 的 JSON，如果需要，您可以选择仅 return 某些字段，这非常适合您的用例。我从来没有听说过存储字段的搜索速度会更快。 _source 字段在磁盘 space 上可能更大，但如果您必须存储每个字段，则无需在 _source 字段上使用存储字段。如果您禁用源字段，则意味着：

您将无法进行部分更新
您将无法重新索引您的 JSON 中的数据 Elasticsearch集群，需要从数据源重新索引（这通常要慢很多）。

Answer 3

Clinton Gormley 在下面的 link 中说

https://groups.google.com/forum/#!topic/elasticsearch/j8cfbv-j73g/discussion

默认情况下 ES 将您的 JSON 文档存储在 _source 字段中，即设置为 "stored"
默认情况下，JSON 文档中的字段设置为 NOT "stored" （即存储为单独的字段）
所以当 ES returns 您的文档（搜索或获取）时，它只加载 _source 字段和returns表示，即单盘seek

有人认为通过存储单个字段会更快而不是从 _source 字段加载整个 JSON 文档。他们不做的意识到每个存储的字段都需要一个磁盘寻道（每次寻道 10 毫秒！ )，并且这些寻求的总和远远超过了仅发送 _source 字段。

换句话说，它几乎总是错误的优化。

Answer 4

默认情况下，在 elasticsearch 中，存储 _source（第一个索引的文档）。这意味着当您搜索时，您可以获得实际的文档来源。此外，如果您明确要求，elasticsearch 会自动从 _source 和 return 中提取 fields/objects （以及可能在其他组件中使用它，例如突出显示）。

您可以指定特定字段也被存储。这意味着该字段的数据将单独存储。这意味着如果您要求 field1 （已存储），elasticsearch 将识别它已存储，并从索引中加载它而不是从 _source 中获取它（假设 _source 已启用).

您希望何时启用存储特定字段？大多数时候，你没有。获取 _source 很快，提取它也很快。如果您有非常大的文档，其中存储 _source 的成本或解析 _source 的成本很高，您可以显式映射一些要存储的字段。

请注意，检索每个存储的字段都会产生成本。因此，例如，如果您有一个 json，其中包含 10 个大小合理的字段，并且您将所有字段都映射为已存储，并请求所有这些字段，这意味着加载每个字段（更多的磁盘搜索），相比之下只需加载 _source（这是一个字段，可能已压缩）。

我在下面 link 得到了这个答案 shay.banon 你可以阅读整个帖子以更好地理解它。 enter link description here

Answer 5

作为ES 7.3的参考，答案就更加清晰了。不要在你有充分的测试理由之前尝试优化在真实的生产条件下。

我可能只是引用 _source:

Users often disable the _source field without thinking about the consequences, and then live to regret it. If the _source field isn't available then a number of features are not supported:

The update, update_by_query, and reindex APIs.

On the fly highlighting.

The ability to reindex from one Elasticsearch index to another, either to change mappings or analysis, or to upgrade an index to a new major version.

The ability to debug queries or aggregations by viewing the original document used at index time.

Potentially in the future, the ability to repair index corruption automatically.

TIP: If disk space is a concern, rather increase the compression level instead of disabling the _source.

此外，使用 stored_fields 并没有您可能想到的明显优势。

If you only want to retrieve the value of a single field or of a few fields, instead of the whole _source, then this can be achieved with source filtering.

Elasticsearch 存储字段与 _source

Elasticsearch store field vs _source

elasticsearch