Python elasticsearch-dsl django 分页
Python elasticsearch-dsl django pagination
如何在 elasticsearch dsl 上使用 django 分页。
我的代码:
query = MultiMatch(query=q, fields=['title', 'body'], fuzziness='AUTO')
s = Search(using=elastic_client, index='post').query(query).sort('-created_at')
response = s.execute()
// this always returns page count 1
paginator = Paginator(response, 100)
page = request.GET.get('page')
try:
posts = paginator.page(page)
except PageNotAnInteger:
posts = paginator.page(1)
except EmptyPage:
posts = paginator.page(paginator.num_pages)
有什么解决办法吗?
我在这个 link 上找到了这个分页器:
from django.core.paginator import Paginator, Page
class DSEPaginator(Paginator):
"""
Override Django's built-in Paginator class to take in a count/total number of items;
Elasticsearch provides the total as a part of the query results, so we can minimize hits.
"""
def __init__(self, *args, **kwargs):
super(DSEPaginator, self).__init__(*args, **kwargs)
self._count = self.object_list.hits.total
def page(self, number):
# this is overridden to prevent any slicing of the object_list - Elasticsearch has
# returned the sliced data already.
number = self.validate_number(number)
return Page(self.object_list, number, self)
然后在视图中我使用:
q = request.GET.get('q', None)
page = int(request.GET.get('page', '1'))
start = (page-1) * 10
end = start + 10
query = MultiMatch(query=q, fields=['title', 'body'], fuzziness='AUTO')
s = Search(using=elastic_client, index='post').query(query)[start:end]
response = s.execute()
paginator = DSEPaginator(response, settings.POSTS_PER_PAGE)
try:
posts = paginator.page(page)
except PageNotAnInteger:
posts = paginator.page(1)
except EmptyPage:
posts = paginator.page(paginator.num_pages)
这样就完美了..
另一种方法是在 Paginator
和 Elasticsearch 查询之间创建一个代理。 Paginator
需要两件事,__len__
(或count
)和__getitem__
(需要一片)。粗略版本的代理是这样工作的:
class ResultsProxy(object):
"""
A proxy object for returning Elasticsearch results that is able to be
passed to a Paginator.
"""
def __init__(self, es, index=None, body=None):
self.es = es
self.index = index
self.body = body
def __len__(self):
result = self.es.count(index=self.index,
body=self.body)
return result['count']
def __getitem__(self, item):
assert isinstance(item, slice)
results = self.es.search(
index=self.index,
body=self.body,
from_=item.start,
size=item.stop - item.start,
)
return results['hits']['hits']
可以将代理实例传递给Paginator
,并将根据需要向 ES 发出请求。
根据 Danielle Madeley 的建议,我还创建了一个搜索结果代理,它适用于最新版本的 django-elasticsearch-dsl==0.4.4
。
from django.utils.functional import LazyObject
class SearchResults(LazyObject):
def __init__(self, search_object):
self._wrapped = search_object
def __len__(self):
return self._wrapped.count()
def __getitem__(self, index):
search_results = self._wrapped[index]
if isinstance(index, slice):
search_results = list(search_results)
return search_results
然后您可以像这样在搜索视图中使用它:
paginate_by = 20
search = MyModelDocument.search()
# ... do some filtering ...
search_results = SearchResults(search)
paginator = Paginator(search_results, paginate_by)
page_number = request.GET.get("page")
try:
page = paginator.page(page_number)
except PageNotAnInteger:
# If page parameter is not an integer, show first page.
page = paginator.page(1)
except EmptyPage:
# If page parameter is out of range, show last existing page.
page = paginator.page(paginator.num_pages)
Django 的 LazyObject 代理分配给 _wrapped 属性的对象的所有属性和方法。我覆盖了 Django 的分页器所需的几个方法,但不能直接使用 Search() 实例。
一个非常简单的解决方案是使用 MultipleObjectMixin 并通过覆盖它来提取 get_queryset()
中的 Elastic 结果。在这种情况下,如果您添加 paginate_by
属性,Django 将自行处理分页。
应该是这样的:
class MyView(MultipleObjectMixin, ListView):
paginate_by = 10
def get_queryset(self):
object_list = []
""" Query Elastic here and return the response data in `object_list`.
If you wish to add filters when querying Elastic,
you can use self.request.GET params here. """
return object_list
注意:上面的代码很宽泛,与我自己的情况不同,所以我不能保证它能正常工作。我通过继承其他 Mixins、覆盖 get_queryset()
并利用 Django 的内置分页来使用类似的解决方案——它对我来说效果很好。因为这是一个简单的修复,所以我决定 post 在这里用一个类似的例子。
如何在 elasticsearch dsl 上使用 django 分页。 我的代码:
query = MultiMatch(query=q, fields=['title', 'body'], fuzziness='AUTO')
s = Search(using=elastic_client, index='post').query(query).sort('-created_at')
response = s.execute()
// this always returns page count 1
paginator = Paginator(response, 100)
page = request.GET.get('page')
try:
posts = paginator.page(page)
except PageNotAnInteger:
posts = paginator.page(1)
except EmptyPage:
posts = paginator.page(paginator.num_pages)
有什么解决办法吗?
我在这个 link 上找到了这个分页器:
from django.core.paginator import Paginator, Page
class DSEPaginator(Paginator):
"""
Override Django's built-in Paginator class to take in a count/total number of items;
Elasticsearch provides the total as a part of the query results, so we can minimize hits.
"""
def __init__(self, *args, **kwargs):
super(DSEPaginator, self).__init__(*args, **kwargs)
self._count = self.object_list.hits.total
def page(self, number):
# this is overridden to prevent any slicing of the object_list - Elasticsearch has
# returned the sliced data already.
number = self.validate_number(number)
return Page(self.object_list, number, self)
然后在视图中我使用:
q = request.GET.get('q', None)
page = int(request.GET.get('page', '1'))
start = (page-1) * 10
end = start + 10
query = MultiMatch(query=q, fields=['title', 'body'], fuzziness='AUTO')
s = Search(using=elastic_client, index='post').query(query)[start:end]
response = s.execute()
paginator = DSEPaginator(response, settings.POSTS_PER_PAGE)
try:
posts = paginator.page(page)
except PageNotAnInteger:
posts = paginator.page(1)
except EmptyPage:
posts = paginator.page(paginator.num_pages)
这样就完美了..
另一种方法是在 Paginator
和 Elasticsearch 查询之间创建一个代理。 Paginator
需要两件事,__len__
(或count
)和__getitem__
(需要一片)。粗略版本的代理是这样工作的:
class ResultsProxy(object):
"""
A proxy object for returning Elasticsearch results that is able to be
passed to a Paginator.
"""
def __init__(self, es, index=None, body=None):
self.es = es
self.index = index
self.body = body
def __len__(self):
result = self.es.count(index=self.index,
body=self.body)
return result['count']
def __getitem__(self, item):
assert isinstance(item, slice)
results = self.es.search(
index=self.index,
body=self.body,
from_=item.start,
size=item.stop - item.start,
)
return results['hits']['hits']
可以将代理实例传递给Paginator
,并将根据需要向 ES 发出请求。
根据 Danielle Madeley 的建议,我还创建了一个搜索结果代理,它适用于最新版本的 django-elasticsearch-dsl==0.4.4
。
from django.utils.functional import LazyObject
class SearchResults(LazyObject):
def __init__(self, search_object):
self._wrapped = search_object
def __len__(self):
return self._wrapped.count()
def __getitem__(self, index):
search_results = self._wrapped[index]
if isinstance(index, slice):
search_results = list(search_results)
return search_results
然后您可以像这样在搜索视图中使用它:
paginate_by = 20
search = MyModelDocument.search()
# ... do some filtering ...
search_results = SearchResults(search)
paginator = Paginator(search_results, paginate_by)
page_number = request.GET.get("page")
try:
page = paginator.page(page_number)
except PageNotAnInteger:
# If page parameter is not an integer, show first page.
page = paginator.page(1)
except EmptyPage:
# If page parameter is out of range, show last existing page.
page = paginator.page(paginator.num_pages)
Django 的 LazyObject 代理分配给 _wrapped 属性的对象的所有属性和方法。我覆盖了 Django 的分页器所需的几个方法,但不能直接使用 Search() 实例。
一个非常简单的解决方案是使用 MultipleObjectMixin 并通过覆盖它来提取 get_queryset()
中的 Elastic 结果。在这种情况下,如果您添加 paginate_by
属性,Django 将自行处理分页。
应该是这样的:
class MyView(MultipleObjectMixin, ListView):
paginate_by = 10
def get_queryset(self):
object_list = []
""" Query Elastic here and return the response data in `object_list`.
If you wish to add filters when querying Elastic,
you can use self.request.GET params here. """
return object_list
注意:上面的代码很宽泛,与我自己的情况不同,所以我不能保证它能正常工作。我通过继承其他 Mixins、覆盖 get_queryset()
并利用 Django 的内置分页来使用类似的解决方案——它对我来说效果很好。因为这是一个简单的修复,所以我决定 post 在这里用一个类似的例子。