如何检索 elasticsearch 索引中的所有文档（大小大于 10000）

Question

我正在尝试获取索引中的所有文档，我尝试了以下操作-

1) 先获取记录总数再设置/_search?size= 参数 - 不起作用，因为 size 参数限制为 10000

2) 尝试通过多次调用进行分页并使用参数 '?size=1000&from=9000' - 一直工作到 'from' < 9000 但在超过 9000 之后我再次收到此大小限制错误 -

"Result window is too large, from + size must be less than or equal to: [10000] but was [100000]. See the scroll api for a more efficient way to request large data sets. This limit can be set by changing the [index.max_result_window] index level setting"

那么如何检索索引中的所有文档？我阅读了一些建议使用滚动 api 的答案，甚至文档说明 -

"While a search request returns a single “page” of results, the scroll API can be used to retrieve large numbers of results (or even all results) from a single search request, in much the same way as you would use a cursor on a traditional database."

但我找不到任何示例查询来在单个请求中获取所有记录。

我的索引中共有 388794 个文档。另请注意，这是一次性调用，因此我不担心性能问题。

Answer 1

找出解决方案- 滚动 api 是正确的方法-这是它的工作原理-

在第一次调用以获取文档时，可以提供一个大小（例如 1000）和滚动参数，以分钟为单位指定搜索上下文超时的时间。

POST /index/type/_search?scroll=1m
{
    "size": 1000,
    "query": {....
    }
}

对于所有后续调用，我们可以使用在第一次调用的响应中返回的 scroll_id 来获取记录的嵌套块。

POST /_search/scroll 
{
    "scroll" : "1m", 
    "scroll_id" : "DnF1ZXJ5VGhIOLSJJKSVNNZZND344D123RRRBNMBBNNN===" 
}

如何检索 elasticsearch 索引中的所有文档（大小大于 10000）

How to retrieve all documents(size greater than 10000) in an elasticsearch index

lucene

elasticsearch