Cloudsearch 请求超过 10,000 个限制

Question

当我搜索超过 10,000 个匹配项的查询时，出现以下错误：

{u'message': u'Request depth (10100) exceeded, limit=10000', u'__type': u'#SearchException', u'error': {u'rid': u'zpXDxukp4bEFCiGqeQ==', u'message': u'[*Deprecated*: Use the outer message field] Request depth (10100) exceeded, limit=10000'}}

当我搜索范围更窄的关键字和结果更少的查询时，一切正常，没有返回任何错误。

我想我必须以某种方式限制搜索，但我不知道如何限制。我的搜索功能如下所示：

def execute_query_string(self, query_string):
    amazon_query = self.search_connection.build_query(q=query_string, start=0, size=100)

    json_search_results = []
    for json_blog in self.search_connection.get_all_hits(amazon_query):
        json_search_results.append(json_blog)

    results = []
    for json_blog in json_search_results:
        results.append(json_blog['fields']) 

    return results

它被这样调用：

results = searcher.execute_query_string(request.GET.get('q', ''))[:100]

如您所见，我尝试使用 build_query() 的 start 和 size 属性来限制结果。不过我仍然收到错误消息。

我一定是误解了如何避免在搜索结果中获得超过 10,000 个匹配项。谁能告诉我怎么做？

关于这个主题，我只能找到 Amazon's Limits，其中说您只能请求 10,000 个结果。它没有说如何限制它。

Answer 1

您正在调用 get_all_hits，它会获取您查询的所有结果。这就是为什么您的 size 参数被忽略的原因。

来自文档：

get_all_hits(query) Get a generator to iterate over all search results

Transparently handles the results paging from Cloudsearch search results so even if you have many thousands of results you can iterate over all results in a reasonably efficient manner.

http://boto.readthedocs.org/en/latest/ref/cloudsearch2.html#boto.cloudsearch2.search.SearchConnection.get_all_hits

您应该调用 search -- http://boto.readthedocs.org/en/latest/ref/cloudsearch2.html#boto.cloudsearch2.search.SearchConnection.search

Cloudsearch 请求超过 10,000 个限制

Cloudsearch Request Exceed 10,000 Limit

python

amazon-web-services

amazon-cloudsearch