如何在elasticsearch中获取从23000到23004的页面起始记录
how get page start records from 23000 to 23004 in elasticsearch
我有一个包含大约 10 万行的 elasticsearch 数据库。我想对大约 30k 行进行分页。
我得到的错误是关于 max-result-window.
在这种情况下,我无法获取从 23000 到 23004 的记录,因为超过了 10k 条记录。有解决方法吗?
我发现一个可能的解决方法是使用滚动 api。
在实践中,我滚动大小为 20(1 页),直到达到页面 51711。这大约需要 10 分钟,因为它在实现开始记录 1070100 到记录 1070120 之前滚动所有数据。
url = "http://localhost:9200"
index = "civile"
pageLimit = 20
bodyPageAllDocBil = {"query": {"bool": {"must": [], "should": []}}, "_source": ["annoruolo", "annosentenza", "cf_giudice","codiceoggetto", "controparte", "gradogiudizio", "nomegiudice", "parte","distretto"]}
bodyCountAllDoc = bodyPageAllDocBil
bodyCountAllDoc.pop('_source', None)
es = Elasticsearch(url)
res = es.count(index=index, body=bodyCountAllDoc)
sizeCount = res["count"]
bodyPageAllDocs = bodyPageAllDocBil
bodyPageAllDocs["size"] = pageLimit
es = Elasticsearch(url)
docs = es.search(index=index, body=bodyPageAllDocs,scroll = '10m')
currentSize = pageLimit
scrollId = docs["_scroll_id"]
page = 51711
paginationStart = (page - 1) * pageLimit
while currentSize <= paginationStart + pageLimit:
es = Elasticsearch(url)
docs = es.scroll(scroll_id = scrollId,scroll = '10m')
countRec = len(docs["hits"]["hits"])
if currentSize == paginationStart:
print(docs["hits"]["hits"][0])
print(docs["hits"]["hits"][1])
#...
currentSize = currentSize + countRec
scrollId = docs['_scroll_id']
我有一个包含大约 10 万行的 elasticsearch 数据库。我想对大约 30k 行进行分页。
我得到的错误是关于 max-result-window.
在这种情况下,我无法获取从 23000 到 23004 的记录,因为超过了 10k 条记录。有解决方法吗?
我发现一个可能的解决方法是使用滚动 api。 在实践中,我滚动大小为 20(1 页),直到达到页面 51711。这大约需要 10 分钟,因为它在实现开始记录 1070100 到记录 1070120 之前滚动所有数据。
url = "http://localhost:9200"
index = "civile"
pageLimit = 20
bodyPageAllDocBil = {"query": {"bool": {"must": [], "should": []}}, "_source": ["annoruolo", "annosentenza", "cf_giudice","codiceoggetto", "controparte", "gradogiudizio", "nomegiudice", "parte","distretto"]}
bodyCountAllDoc = bodyPageAllDocBil
bodyCountAllDoc.pop('_source', None)
es = Elasticsearch(url)
res = es.count(index=index, body=bodyCountAllDoc)
sizeCount = res["count"]
bodyPageAllDocs = bodyPageAllDocBil
bodyPageAllDocs["size"] = pageLimit
es = Elasticsearch(url)
docs = es.search(index=index, body=bodyPageAllDocs,scroll = '10m')
currentSize = pageLimit
scrollId = docs["_scroll_id"]
page = 51711
paginationStart = (page - 1) * pageLimit
while currentSize <= paginationStart + pageLimit:
es = Elasticsearch(url)
docs = es.scroll(scroll_id = scrollId,scroll = '10m')
countRec = len(docs["hits"]["hits"])
if currentSize == paginationStart:
print(docs["hits"]["hits"][0])
print(docs["hits"]["hits"][1])
#...
currentSize = currentSize + countRec
scrollId = docs['_scroll_id']