如何在elasticsearch中获取从23000到23004的页面起始记录

Question

我有一个包含大约 10 万行的 elasticsearch 数据库。我想对大约 30k 行进行分页。

我得到的错误是关于 max-result-window.

在这种情况下，我无法获取从 23000 到 23004 的记录，因为超过了 10k 条记录。有解决方法吗？

Answer 1

我发现一个可能的解决方法是使用滚动 api。在实践中，我滚动大小为 20（1 页），直到达到页面 51711。这大约需要 10 分钟，因为它在实现开始记录 1070100 到记录 1070120 之前滚动所有数据。

url = "http://localhost:9200"
index = "civile"
pageLimit = 20

bodyPageAllDocBil = {"query": {"bool": {"must": [], "should": []}}, "_source": ["annoruolo", "annosentenza",  "cf_giudice","codiceoggetto", "controparte", "gradogiudizio", "nomegiudice", "parte","distretto"]}

bodyCountAllDoc = bodyPageAllDocBil
bodyCountAllDoc.pop('_source', None)
es = Elasticsearch(url)
res = es.count(index=index, body=bodyCountAllDoc) 
sizeCount = res["count"]

bodyPageAllDocs = bodyPageAllDocBil
bodyPageAllDocs["size"] = pageLimit
es = Elasticsearch(url)

docs = es.search(index=index, body=bodyPageAllDocs,scroll = '10m')

currentSize = pageLimit
scrollId = docs["_scroll_id"]
page = 51711
paginationStart = (page - 1) * pageLimit

while currentSize <= paginationStart + pageLimit:
    es = Elasticsearch(url)
    docs = es.scroll(scroll_id = scrollId,scroll = '10m')
    countRec = len(docs["hits"]["hits"])
    
    if currentSize == paginationStart:
        print(docs["hits"]["hits"][0])
        print(docs["hits"]["hits"][1])
        #...

    currentSize = currentSize + countRec
    scrollId = docs['_scroll_id']

如何在elasticsearch中获取从23000到23004的页面起始记录

how get page start records from 23000 to 23004 in elasticsearch

python

pagination

elasticsearch