弹性搜索没有为页面大小提供大量数据
Elastic search not giving data with big number for page size
要获取的数据大小:大约 20,000
问题:在 python
中使用以下命令搜索 Elastic Search 索引数据
但没有得到任何结果。
from pyelasticsearch import ElasticSearch
es_repo = ElasticSearch(settings.ES_INDEX_URL)
search_results = es_repo.search(
query, index=advertiser_name, es_from=_from, size=_size)
如果我给出的大小小于或等于 10,000,它可以正常工作,但不能用于 20,000
请帮我找到一个最佳解决方案。
PS:深入挖掘 ES 发现此消息错误:
结果 window 太大,from + size 必须小于或等于:[10000] 但为 [19999]。请参阅滚动条 API 以获取更有效的请求大型数据集的方法。
可能是它的 ElasticSearch 约束。
index.max_result_window index setting which defaults to 10,000
实时使用的最佳解决方案是使用 search after query。您只需要一个日期字段和另一个唯一标识文档的字段 - _id
字段或 _uid
字段就足够了。
尝试这样的事情,在我的示例中,我想提取属于单个用户的所有文档 - 在我的示例中,用户字段有一个 keyword datatype
:
from elasticsearch import Elasticsearch
es = Elasticsearch()
es_index = "your_index_name"
documento = "your_doc_type"
user = "Francesco Totti"
body2 = {
"query": {
"term" : { "user" : user }
}
}
res = es.count(index=es_index, doc_type=documento, body= body2)
size = res['count']
body = { "size": 10,
"query": {
"term" : {
"user" : user
}
},
"sort": [
{"date": "asc"},
{"_uid": "desc"}
]
}
result = es.search(index=es_index, doc_type=documento, body= body)
bookmark = [result['hits']['hits'][-1]['sort'][0], str(result['hits']['hits'][-1]['sort'][1]) ]
body1 = {"size": 10,
"query": {
"term" : {
"user" : user
}
},
"search_after": bookmark,
"sort": [
{"date": "asc"},
{"_uid": "desc"}
]
}
while len(result['hits']['hits']) < size:
res =es.search(index=es_index, doc_type=documento, body= body1)
for el in res['hits']['hits']:
result['hits']['hits'].append( el )
bookmark = [res['hits']['hits'][-1]['sort'][0], str(result['hits']['hits'][-1]['sort'][1]) ]
body1 = {"size": 10,
"query": {
"term" : {
"user" : user
}
},
"search_after": bookmark,
"sort": [
{"date": "asc"},
{"_uid": "desc"}
]
}
然后你会发现所有附加到 result
var
的文档
如果您想使用 scroll query
- 文档 here:
from elasticsearch import Elasticsearch, helpers
es = Elasticsearch()
es_index = "your_index_name"
documento = "your_doc_type"
user = "Francesco Totti"
body = {
"query": {
"term" : { "user" : user }
}
}
res = helpers.scan(
client = es,
scroll = '2m',
query = body,
index = es_index)
for i in res:
print(i)
要获取的数据大小:大约 20,000
问题:在 python
中使用以下命令搜索 Elastic Search 索引数据但没有得到任何结果。
from pyelasticsearch import ElasticSearch
es_repo = ElasticSearch(settings.ES_INDEX_URL)
search_results = es_repo.search(
query, index=advertiser_name, es_from=_from, size=_size)
如果我给出的大小小于或等于 10,000,它可以正常工作,但不能用于 20,000 请帮我找到一个最佳解决方案。
PS:深入挖掘 ES 发现此消息错误:
结果 window 太大,from + size 必须小于或等于:[10000] 但为 [19999]。请参阅滚动条 API 以获取更有效的请求大型数据集的方法。
可能是它的 ElasticSearch 约束。
index.max_result_window index setting which defaults to 10,000
实时使用的最佳解决方案是使用 search after query。您只需要一个日期字段和另一个唯一标识文档的字段 - _id
字段或 _uid
字段就足够了。
尝试这样的事情,在我的示例中,我想提取属于单个用户的所有文档 - 在我的示例中,用户字段有一个 keyword datatype
:
from elasticsearch import Elasticsearch
es = Elasticsearch()
es_index = "your_index_name"
documento = "your_doc_type"
user = "Francesco Totti"
body2 = {
"query": {
"term" : { "user" : user }
}
}
res = es.count(index=es_index, doc_type=documento, body= body2)
size = res['count']
body = { "size": 10,
"query": {
"term" : {
"user" : user
}
},
"sort": [
{"date": "asc"},
{"_uid": "desc"}
]
}
result = es.search(index=es_index, doc_type=documento, body= body)
bookmark = [result['hits']['hits'][-1]['sort'][0], str(result['hits']['hits'][-1]['sort'][1]) ]
body1 = {"size": 10,
"query": {
"term" : {
"user" : user
}
},
"search_after": bookmark,
"sort": [
{"date": "asc"},
{"_uid": "desc"}
]
}
while len(result['hits']['hits']) < size:
res =es.search(index=es_index, doc_type=documento, body= body1)
for el in res['hits']['hits']:
result['hits']['hits'].append( el )
bookmark = [res['hits']['hits'][-1]['sort'][0], str(result['hits']['hits'][-1]['sort'][1]) ]
body1 = {"size": 10,
"query": {
"term" : {
"user" : user
}
},
"search_after": bookmark,
"sort": [
{"date": "asc"},
{"_uid": "desc"}
]
}
然后你会发现所有附加到 result
var
如果您想使用 scroll query
- 文档 here:
from elasticsearch import Elasticsearch, helpers
es = Elasticsearch()
es_index = "your_index_name"
documento = "your_doc_type"
user = "Francesco Totti"
body = {
"query": {
"term" : { "user" : user }
}
}
res = helpers.scan(
client = es,
scroll = '2m',
query = body,
index = es_index)
for i in res:
print(i)