为什么弹性搜索批量插入使用 \n 分隔符，而不是使用 json 对象数组？

Question

这是弹性搜索文档提供的批量插入示例，网址为：https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-bulk.html

POST _bulk
{ "index" : { "_index" : "test", "_type" : "type1", "_id" : "1" } }
{ "field1" : "value1" }
{ "delete" : { "_index" : "test", "_type" : "type1", "_id" : "2" } }
{ "create" : { "_index" : "test", "_type" : "type1", "_id" : "3" } }
{ "field1" : "value3" }
{ "update" : {"_id" : "1", "_type" : "type1", "_index" : "test"} }
{ "doc" : {"field2" : "value2"} }

他们提到 "Because this format uses literal \n's as delimiters, please be sure that the JSON actions and sources are not pretty printed"。

我想知道这种输入格式背后的原因，以及为什么他们不选择 JSON 个对象的数组。

例如：

POST _bulk
    [{{ "index" : { "_index" : "test", "_type" : "type1", "_id" : "1" } }
    { "field1" : "value1" }},
    { "delete" : { "_index" : "test", "_type" : "type1", "_id" : "2" } }
    { "create" : { "_index" : "test", "_type" : "type1", "_id" : "3" } }
    { "field1" : "value3" }
    { "update" : {"_id" : "1", "_type" : "type1", "_index" : "test"} }
    { "doc" : {"field2" : "value2"} }]

上面的结构不正确，应该是这样的在 REST API 开发标准中，我是否缺少一些常见的东西？分隔符而不是数组？

Answer 1

这允许批量端点一个接一个地处理正文 one/two 行。如果它是一个 JSON 数组，ES 必须将整个 JSON 主体加载并解析到内存中，以便一个接一个地提取数组元素。

知道批量主体可能非常大（即数百 MB），这是一项优化，可防止您的 ES 服务器在发送大量批量请求时崩溃。

为什么弹性搜索批量插入使用 \n 分隔符，而不是使用 json 对象数组？

Why elastic search bulk insert uses \n delimiter, instead of using an array of json objects?

rest

elasticsearch

elasticsearch-api