您如何基于 ManyToManyField 在 Wagtail 中过滤搜索结果?

How do you filter search results in Wagtail based on a ManyToManyField?

我有一个 Wagtail 站点,它定义了一个 Event 模型。这些活动有多个活动赞助商,它们由 EventSponsor 模型上的 ManyToManyField 关联:

class Event(index.Indexed, ClusterableModel):

    title       = models.CharField(max_length=255)
    start_date  = models.DateTimeField()
    end_date    = models.DateTimeField(null=True, blank=True)
    description = RichTextField(blank=True)

    search_fields = [
        index.SearchField('title', partial_match=True, boost=2.0),
        index.SearchField('description'),
        index.RelatedFields('sponsors', [
            index.SearchField('name', partial_match=True)
        ]),

        index.FilterField('end_date'),
        index.FilterField('sponsors'),
    ]

class EventSponsor(index.Indexed, models.Model):

    sponsor_id = models.IntegerField()
    name = models.CharField(max_length=255)
    url = models.URLField(blank=True)

    events = models.ManyToManyField(Event, related_name='sponsors')

    search_fields = [
        index.SearchField('name', partial_match=True),
    ]

除此之外,我的 Wagtail 服务器上的不同站点根据特定于该站点的一组选定的活动赞助商在其日历中包含活动。

因此,为每个站点构建日历列表​​查询集如下所示:

def get_events_for_current_site(request, listing):
    try:
        event_sponsor_settings = EventSponsorSettings.objects.get(site=request.site)
    except EventSponsorSettings.DoesNotExist:
        # If there's no EventSponsorSettings for this Site, return an empty QuerySet. This shouldn't really ever happen.
        return Event.objects.none()

    # Return the selected Events in decending order of start date.
    query = Event.objects.filter(sponsors__in=event_sponsor_settings.selected_event_sponsors)
    if listing == 'upcoming_events':
        return query.order_by('start_date').filter(end_date__gte=timezone.now())
    else:
        return query.order_by('-start_date').filter(end_date__lt=timezone.now())

event_sponsor_settings.selected_event_sponsorsEventSponsor 个对象的列表。此查询集适用于列表页面。

我需要每个站点上的搜索功能(使用 Elasticsearch 后端)只包含将出现在当前站点日历上的事件。所以我希望我的基本查询集与日历页面使用的相同(或者至少进行相同的过滤)。所以我的事件搜索代码基本上调用:

backend.search(search_query, get_events_for_current_site())

但是,我 运行 遇到了两个问题:

1) 如果我在 Event.search_fields 中使用 index.FilterField('sponsors'),当我 运行 manage.py update_index:

时会出现此错误
Traceback (most recent call last):
  File "./manage.py", line 33, in <module>
    execute_from_command_line(argv)
  File "/multitenant-ve/lib/python2.7/site-packages/django/core/management/__init__.py", line 353, in execute_from_command_line
    utility.execute()
  File "/multitenant-ve/lib/python2.7/site-packages/django/core/management/__init__.py", line 345, in execute
    self.fetch_command(subcommand).run_from_argv(self.argv)
  File "/multitenant-ve/lib/python2.7/site-packages/django/core/management/base.py", line 348, in run_from_argv
    self.execute(*args, **cmd_options)
  File "/multitenant-ve/lib/python2.7/site-packages/django/core/management/base.py", line 399, in execute
    output = self.handle(*args, **options)
  File "/multitenant-ve/src/wagtail/wagtail/wagtailsearch/management/commands/update_index.py", line 120, in handle
    self.update_backend(backend_name, schema_only=options.get('schema_only', False))
  File "/multitenant-ve/src/wagtail/wagtail/wagtailsearch/management/commands/update_index.py", line 77, in update_backend
    index.add_model(model)
  File "/multitenant-ve/src/wagtail/wagtail/wagtailsearch/backends/elasticsearch.py", line 536, in add_model
    index=self.name, doc_type=mapping.get_document_type(), body=mapping.get_mapping()
  File "/multitenant-ve/src/wagtail/wagtail/wagtailsearch/backends/elasticsearch.py", line 137, in get_mapping
    self.get_field_mapping(field) for field in self.model.get_search_fields()
  File "/multitenant-ve/src/wagtail/wagtail/wagtailsearch/backends/elasticsearch.py", line 137, in <genexpr>
    self.get_field_mapping(field) for field in self.model.get_search_fields()
  File "/multitenant-ve/src/wagtail/wagtail/wagtailsearch/backends/elasticsearch.py", line 119, in get_field_mapping
    return self.get_field_column_name(field), mapping
  File "/multitenant-ve/src/wagtail/wagtail/wagtailsearch/backends/elasticsearch.py", line 72, in get_field_column_name
    return field.get_attname(self.model) + '_filter'
  File "/multitenant-ve/src/wagtail/wagtail/wagtailsearch/index.py", line 178, in get_attname
    return field.attname
AttributeError: 'ManyToManyRel' object has no attribute 'attname'

2)如果我取出index.FilterField('sponsors')manage.py update_index有效,但搜索时出现错误:

Cannot filter search results with field "eventsponsor_id". Please add index.FilterField('eventsponsor_id') to Event.search_fields.

所以我尝试添加 index.FilterField('eventsponsor_id'),它在 update_index 期间发出此警告:Event.search_fields contains field 'eventsponsor_id' but it doesn't exist,并在搜索时导致此回溯:

Traceback:
File "/multitenant-ve/lib/python2.7/site-packages/django/core/handlers/base.py" in get_response
  174.                     response = self.process_exception_by_middleware(e, request)
File "/multitenant-ve/lib/python2.7/site-packages/django/core/handlers/base.py" in get_response
  172.                     response = response.render()
File "/multitenant-ve/lib/python2.7/site-packages/django/template/response.py" in render
  160.             self.content = self.rendered_content
File "/multitenant-ve/lib/python2.7/site-packages/django/template/response.py" in rendered_content
  137.         content = template.render(context, self._request)
File "/multitenant-ve/lib/python2.7/site-packages/django/template/backends/django.py" in render
  95.             return self.template.render(context)
File "/multitenant-ve/lib/python2.7/site-packages/django/template/base.py" in render
  206.                     return self._render(context)
File "/multitenant-ve/lib/python2.7/site-packages/django/template/base.py" in _render
  197.         return self.nodelist.render(context)
File "/multitenant-ve/lib/python2.7/site-packages/django/template/base.py" in render
  992.                 bit = node.render_annotated(context)
File "/multitenant-ve/lib/python2.7/site-packages/django/template/base.py" in render_annotated
  959.             return self.render(context)
File "/multitenant-ve/lib/python2.7/site-packages/django/template/loader_tags.py" in render
  173.         return compiled_parent._render(context)
File "/multitenant-ve/lib/python2.7/site-packages/django/template/base.py" in _render
  197.         return self.nodelist.render(context)
File "/multitenant-ve/lib/python2.7/site-packages/django/template/base.py" in render
  992.                 bit = node.render_annotated(context)
File "/multitenant-ve/lib/python2.7/site-packages/django/template/base.py" in render_annotated
  959.             return self.render(context)
File "/multitenant-ve/lib/python2.7/site-packages/django/template/loader_tags.py" in render
  173.         return compiled_parent._render(context)
File "/multitenant-ve/lib/python2.7/site-packages/django/template/base.py" in _render
  197.         return self.nodelist.render(context)
File "/multitenant-ve/lib/python2.7/site-packages/django/template/base.py" in render
  992.                 bit = node.render_annotated(context)
File "/multitenant-ve/lib/python2.7/site-packages/django/template/base.py" in render_annotated
  959.             return self.render(context)
File "/multitenant-ve/lib/python2.7/site-packages/django/template/loader_tags.py" in render
  69.                 result = block.nodelist.render(context)
File "/multitenant-ve/lib/python2.7/site-packages/django/template/base.py" in render
  992.                 bit = node.render_annotated(context)
File "/multitenant-ve/lib/python2.7/site-packages/django/template/base.py" in render_annotated
  959.             return self.render(context)
File "/multitenant-ve/lib/python2.7/site-packages/django/template/defaulttags.py" in render
  220.                     nodelist.append(node.render_annotated(context))
File "/multitenant-ve/lib/python2.7/site-packages/django/template/base.py" in render_annotated
  959.             return self.render(context)
File "/multitenant-ve/lib/python2.7/site-packages/django/template/defaulttags.py" in render
  325.             if match:
File "/multitenant-ve/src/wagtail/wagtail/wagtailsearch/backends/base.py" in __len__
  174.         return len(self.results())
File "/multitenant-ve/src/wagtail/wagtail/wagtailsearch/backends/base.py" in results
  137.             self._results_cache = self._do_search()
File "/multitenant-ve/src/wagtail/wagtail/wagtailsearch/backends/elasticsearch.py" in _do_search
  452.         hits = self.backend.es.search(**params)
File "/multitenant-ve/lib/python2.7/site-packages/elasticsearch/client/utils.py" in _wrapped
  69.             return func(*args, params=params, **kwargs)
File "/multitenant-ve/lib/python2.7/site-packages/elasticsearch/client/__init__.py" in search
  531.             doc_type, '_search'), params=params, body=body)
File "/multitenant-ve/lib/python2.7/site-packages/elasticsearch/transport.py" in perform_request
  273.             body = self.serializer.dumps(body)
File "/multitenant-ve/lib/python2.7/site-packages/elasticsearch/serializer.py" in dumps
  47.             raise SerializationError(data, e)

Exception Type: SerializationError at /search
Exception Value: ({u'query': {u'filtered': {u'filter': {u'and': [{u'prefix': {u'content_type': u'event'}}, {'and': [{u'terms': {u'eventsponsor_id_filter': [<EventSponsor: Division of Geological and Planetary Sciences (9003)>]}}, {u'range': {u'end_date_filter': {'gte': datetime.datetime(2017, 3, 29, 0, 42, 7, 462939, tzinfo=<UTC>)}}}]}]}, u'query': {u'multi_match': {u'query': u'geo', u'fields': [u'_all', u'_partials']}}}}}, TypeError("Unable to serialize <EventSponsor: Division of Geological and Planetary Sciences (9003)> (type: <class 'templated_cms.models.events.EventSponsor'>)",))

所以,我尝试将 get_events_for_current_site() 中的查询集更改为 Event.objects.filter(sponsors__id__in=[s.id for s in event_sponsor_settings.selected_event_sponsors])

这修复了错误...但我没有得到任何搜索结果。

我完全不知道如何处理这个问题。 :(

首先,这个 post 帮助我解决了这个问题,非常感谢。

FilterFields 非常适合 运行 搜索结果过滤器。在这种情况下,我们只需要从过滤后的查询集构建搜索结果。

我的解决方法如下:

  1. 收集您想要形成搜索结果的事件的 ID。

    event_ids = get_events_for_current_site().values_list('id', flat=True)
    
  2. 基于这些 ID 构建一个新的查询集。

    filtered_events = Event.objects.filter(id__in=event_ids)
    
  3. 将新查询集传递给您的搜索

    backend.search(search_query, filtered_events)
    

由于传递给搜索的查询集被过滤掉了 id,您需要在 Event.search_fields 中包含 index.FilterField('id') 并更新您的索引。

请注意,我没有专门测试报告的代码,而是测试了我自己的变体。

此外,这个 Wagtail 支持 post 让我对解决这个问题有了一些了解:https://groups.google.com/forum/#!msg/wagtail/k2-E4h2oLtI/uPOzbuwKBgAJ

这个 post 确实有一个警告,说明使用这种方法 "shouldn't hit performance too badly as long as you don't have 1000s of [items]"。

对于那些 运行 以后遇到这个问题的人,这是我最终解决这个问题的方法(为了显示我使用的机制,代码已经被缩减到最低限度):

class Event(index.Indexed, ClusterableModel):

    title       = models.CharField(max_length=255)
    start_date  = models.DateTimeField()
    end_date    = models.DateTimeField(null=True, blank=True)
    description = RichTextField(blank=True)
    lecture_series = models.ForeignKey(
        'this_app.LectureSeries', null=True, blank=True, related_name='events', 
        on_delete=models.SET_NULL
    )

    search_fields = [
        ...
        # We use a Filterfield on lecture_series here because we apparently can't do it 
        # on lecture_series_id for whatever reason. This means we need to filter Events
        # on their lecture_series directly on all querysets that will get used as a 
        # search filter.
        index.FilterField('lecture_series'),
        # We can't filter directly on a ManyToMany relationship, so we need to be a bit
        # creative. This uses the sponsor_id() method defined below to add our 
        # EventSponsors' sponsor_ids to the search index.
        index.FilterField('sponsor_id'),
    ]

    def sponsor_id(self):
        """
        Adds all of our EventSponsors' sponsor_ids to the search filter list.
        """
        return list(self.sponsors.all().values_list('sponsor_id', flat=True))


class EventSponsor(index.Indexed, models.Model):

    sponsor_id = models.IntegerField()
    name = models.CharField(max_length=255)

    events = models.ManyToManyField(Event, related_name='sponsors')

    search_fields = [
        index.SearchField('name', partial_match=True),
    ]


class LectureSeries(index.Indexed, models.Model):

    lecture_series_id = models.IntegerField(unique=True)
    name = models.CharField(max_length=255)

    search_fields = [
        index.SearchField('name', partial_match=True),
    ]


def get_base_events_queryset_for_site(site):
    """
    Returns the base queryset object from which all Event listings for a spcified Site
    must be derived.
    This function filters the list of Event objects down to just those that the Site's
    admins have chosen to display.
    """
    try:
        settings = EventListingSettings.objects.get(site=site)
    except EventListingSettings.DoesNotExist:
        # If there's no EventListingSettings for this Site, return an empty QuerySet.
        return Event.objects.none()

    # We need to do the sponsors via their sponsor_ids because searches can't be filtered
    # directly on a ManyToMany relationship.
    sponsor_ids = [sponsor.sponsor_id for sponsor in settings.event_sponsors.all()]

    # We need to split these up for Sites which import either no LectureSeries or no 
    # EventSponsors. Listings will get dupes, and searches will crash if we don't.
    if settings.lecture_series.exists() and sponsor_ids:
        queryset = Event.objects.filter(
            Q(sponsors__sponsor_id__in=sponsor_ids) | 
            Q(lecture_series__in=settings.lecture_series.all())
        )
    elif sponsor_ids:
        queryset = event_model.objects.filter(sponsors__sponsor_id__in=sponsor_ids)
    else:
        queryset = event_model.objects.filter(lecture_series__in=settings.lecture_series.all())

    return queryset

如您所见,常规外键可用于正常过滤搜索,但多对多关系需要一些特殊的 ID 列表代码才能构建可转换为 ElasticSearch 查询的查询集。

我对 searching/filtering 文本和表单中的许多字段也有类似的问题。

我的解决方法:

  1. 对模型执行 ElasticSearch。
  2. 将结果 (DatabaseSearchResults) 转换为查询集。
  3. 使用表单
  4. 中的数据将多对多过滤器应用于查询集

例如,

results = MyModel.search(search_terms, fields=['title', 'body'], operator='or')
qs = results.get_queryset()

m2m_objects = self.cleaned_data.get('m2m_field')
qs = qs.filter(m2m_field__in=m2m_objects)