django prefetch_related 应该与 GenericRelation 一起工作吗

Is django prefetch_related supposed to work with GenericRelation

更新: 关于此问题的公开勾选:24272

这是怎么回事?

Django 有一个 GenericRelation class,它添加了一个 “反向”通用关系 以启用额外的 API

原来我们可以将这个reverse-generic-relation用于filteringordering,但我们不能在prefetch_related.

内部使用它

我想知道这是一个错误,还是它不应该工作,或者它可以在功能中实现。

让我用一些例子告诉你我的意思。

假设我们有两个主要模型:MoviesBooks

我们想为我们的 MoviesBooks 分配标签,但我们不想使用 MovieTagBookTag 模型,而是想使用单个 TaggedItem class 与 GFKMovieBook.

模型结构如下:

from django.db import models
from django.contrib.contenttypes.fields import GenericForeignKey, GenericRelation
from django.contrib.contenttypes.models import ContentType


class TaggedItem(models.Model):
    tag = models.SlugField()
    content_type = models.ForeignKey(ContentType)
    object_id = models.PositiveIntegerField()
    content_object = GenericForeignKey('content_type', 'object_id')

    def __unicode__(self):
        return self.tag


class Director(models.Model):
    name = models.CharField(max_length=100)

    def __unicode__(self):
        return self.name


class Movie(models.Model):
    name = models.CharField(max_length=100)
    director = models.ForeignKey(Director)
    tags = GenericRelation(TaggedItem, related_query_name='movies')

    def __unicode__(self):
        return self.name


class Author(models.Model):
    name = models.CharField(max_length=100)

    def __unicode__(self):
        return self.name


class Book(models.Model):
    name = models.CharField(max_length=100)
    author = models.ForeignKey(Author)
    tags = GenericRelation(TaggedItem, related_query_name='books')

    def __unicode__(self):
        return self.name

和一些初始数据:

>>> from tags.models import Book, Movie, Author, Director, TaggedItem
>>> a = Author.objects.create(name='E L James')
>>> b1 = Book.objects.create(name='Fifty Shades of Grey', author=a)
>>> b2 = Book.objects.create(name='Fifty Shades Darker', author=a)
>>> b3 = Book.objects.create(name='Fifty Shades Freed', author=a)
>>> d = Director.objects.create(name='James Gunn')
>>> m1 = Movie.objects.create(name='Guardians of the Galaxy', director=d)
>>> t1 = TaggedItem.objects.create(content_object=b1, tag='roman')
>>> t2 = TaggedItem.objects.create(content_object=b2, tag='roman')
>>> t3 = TaggedItem.objects.create(content_object=b3, tag='roman')
>>> t4 = TaggedItem.objects.create(content_object=m1, tag='action movie')

正如 docs 所示,我们可以做这样的事情。

>>> b1.tags.all()
[<TaggedItem: roman>]
>>> m1.tags.all()
[<TaggedItem: action movie>]
>>> TaggedItem.objects.filter(books__author__name='E L James')
[<TaggedItem: roman>, <TaggedItem: roman>, <TaggedItem: roman>]
>>> TaggedItem.objects.filter(movies__director__name='James Gunn')
[<TaggedItem: action movie>]
>>> Book.objects.all().prefetch_related('tags')
[<Book: Fifty Shades of Grey>, <Book: Fifty Shades Darker>, <Book: Fifty Shades Freed>]
>>> Book.objects.filter(tags__tag='roman')
[<Book: Fifty Shades of Grey>, <Book: Fifty Shades Darker>, <Book: Fifty Shades Freed>]

但是,如果我们尝试通过这个 reverse generic relation prefetch TaggedItem 的一些 related data,我们将得到一个 AttributeError.

>>> TaggedItem.objects.all().prefetch_related('books')
Traceback (most recent call last):
  ...
AttributeError: 'Book' object has no attribute 'object_id'

有些人可能会问,为什么我在这里不使用content_object而不是books?原因是,因为这只在我们想要的时候有效:

1) prefetch 仅比 querysets 深一层,包含不同类型的 content_object

>>> TaggedItem.objects.all().prefetch_related('content_object')
[<TaggedItem: roman>, <TaggedItem: roman>, <TaggedItem: roman>, <TaggedItem: action movie>]

2) prefetch 很多级别,但从 querysets 开始只包含一种 content_object.

>>> TaggedItem.objects.filter(books__author__name='E L James').prefetch_related('content_object__author')
[<TaggedItem: roman>, <TaggedItem: roman>, <TaggedItem: roman>]

但是,如果我们想要 1) 和 2)(从 queryset 到包含不同类型 content_objectsprefetch 多个级别,我们不能使用 content_object.

>>> TaggedItem.objects.all().prefetch_related('content_object__author')
Traceback (most recent call last):
  ...
AttributeError: 'Movie' object has no attribute 'author_id'

Django认为所有content_objects都是Books,因此他们有一个Author.

现在想象一下我们不仅要 books 和他们的 author,还要 movies 和他们的 director 的情况。这里有一些尝试。

愚蠢的方式:

>>> TaggedItem.objects.all().prefetch_related(
...     'content_object__author',
...     'content_object__director',
... )
Traceback (most recent call last):
  ...
AttributeError: 'Movie' object has no attribute 'author_id'

也许使用自定义 Prefetch 对象?

>>>
>>> TaggedItem.objects.all().prefetch_related(
...     Prefetch('content_object', queryset=Book.objects.all().select_related('author')),
...     Prefetch('content_object', queryset=Movie.objects.all().select_related('director')),
... )
Traceback (most recent call last):
  ...
ValueError: Custom queryset can't be used for this lookup.

显示了此问题的一些解决方案 here。但这是我想避免的对数据的大量按摩。 我真的很喜欢 reversed generic relations 的 API,如果能像那样做 prefetchs 就好了:

>>> TaggedItem.objects.all().prefetch_related(
...     'books__author',
...     'movies__director',
... )
Traceback (most recent call last):
  ...
AttributeError: 'Book' object has no attribute 'object_id'

或者像这样:

>>> TaggedItem.objects.all().prefetch_related(
...     Prefetch('books', queryset=Book.objects.all().select_related('author')),
...     Prefetch('movies', queryset=Movie.objects.all().select_related('director')),
... )
Traceback (most recent call last):
  ...
AttributeError: 'Book' object has no attribute 'object_id'

但是如您所见,我们得到了 AttributeError。 我正在使用 Django 1.7.3 和 Python 2.7.6。我很好奇为什么 Django 会抛出这个错误?为什么 Django 在 Book 模型中搜索 object_id为什么我认为这可能是一个错误? 通常当我们要求 prefetch_related 解决它无法解决的问题时,我们会看到:

>>> TaggedItem.objects.all().prefetch_related('some_field')
Traceback (most recent call last):
  ...
AttributeError: Cannot find 'some_field' on TaggedItem object, 'some_field' is an invalid parameter to prefetch_related()

但是在这里,不一样。 Django 实际上试图解决关系......但失败了。这是应该报告的错误吗?我从来没有向 Django 报告过任何事情,所以这就是我先在这里问的原因。我无法追踪错误并自行决定这是错误还是可以实现的功能。

如果您想检索 Book 个实例并预取相关标签,请使用 Book.objects.prefetch_related('tags')。这里不需要使用反向关系。

你也可以看看Django source code中的相关测试。

此外 Django documentation 声明 prefetch_related() 应该与 GenericForeignKeyGenericRelation 一起工作:

prefetch_related, on the other hand, does a separate lookup for each relationship, and does the ‘joining’ in Python. This allows it to prefetch many-to-many and many-to-one objects, which cannot be done using select_related, in addition to the foreign key and one-to-one relationships that are supported by select_related. It also supports prefetching of GenericRelation and GenericForeignKey.

更新: 要为 TaggedItem 预取 content_object,您可以使用 TaggedItem.objects.all().prefetch_related('content_object'),如果您只想将结果限制为标记的 Book 个对象,您可以额外过滤 ContentType(不确定 prefetch_related 是否适用于 related_query_name)。如果您还想将 Author 与您需要使用的书一起使用 select_related() not prefetch_related() as this is a ForeignKey relationship, you can combine this in a custom prefetch_related() query:

from django.contrib.contenttypes.models import ContentType
from django.db.models import Prefetch

book_ct = ContentType.objects.get_for_model(Book)
TaggedItem.objects.filter(content_type=book_ct).prefetch_related(
    Prefetch(
        'content_object',  
        queryset=Book.objects.all().select_related('author')
    )
)

prefetch_related_objects 救援。

从 Django 1.10 开始 (注意:它仍然出现在以前的版本中,但不是 public API 的一部分。),我们可以用prefetch_related_objects来分而治之。

prefetch_related 是一个操作,其中 Django 在 对查询集进行评估后 获取相关数据(在对主要查询进行评估后进行第二次查询)。为了工作,它希望查询集中的项目是同类的(相同类型)。反向泛型生成现在不起作用的主要原因是我们有来自不同内容类型的对象,而代码还不够智能,无法分离不同内容类型的流。

现在使用 prefetch_related_objects 我们只在查询集的 子集 上进行提取,其中所有项目都是同类的。这是一个例子:

from django.db import models
from django.db.models.query import prefetch_related_objects
from django.core.paginator import Paginator
from django.contrib.contenttypes.models import ContentType
from tags.models import TaggedItem, Book, Movie


tagged_items = TaggedItem.objects.all()
paginator = Paginator(tagged_items, 25)
page = paginator.get_page(1)

# prefetch books with their author
# do this only for items where
# tagged_item.content_object is a Book
book_ct = ContentType.objects.get_for_model(Book)
tags_with_books = [item for item in page.object_list if item.content_type_id == book_ct.id]
prefetch_related_objects(tags_with_books, "content_object__author")

# prefetch movies with their director
# do this only for items where
# tagged_item.content_object is a Movie
movie_ct = ContentType.objects.get_for_model(Movie)
tags_with_movies = [item for item in page.object_list if item.content_type_id == movie_ct.id]
prefetch_related_objects(tags_with_movies, "content_object__director")

# This will make 5 queries in total
# 1 for page items
# 1 for books
# 1 for book authors
# 1 for movies
# 1 for movie directors
# Iterating over items wont make other queries
for item in page.object_list:
    # do something with item.content_object
    # and item.content_object.author/director
    print(
        item,
        item.content_object,
        getattr(item.content_object, 'author', None),
        getattr(item.content_object, 'director', None)
    )

基于 Bernhard 的答案,最后有一个 code-snippet 在现实中抛出以下错误:

ValueError: Custom queryset can't be used for this lookup.

我已经覆盖了 GenericForeignKey 以实际允许该行为,这个实现的防弹程度,目前我还不知道,但它似乎完成了我需要完成的工作,所以我将它张贴在这里,希望它会帮助别人。请注意 START CHANGESEND CHANGES 标签以查看我对原始 django 代码的更改。

from django.contrib.contenttypes.fields import GenericForeignKey as BaseGenericForeignKey

class CustomGenericForeignKey(BaseGenericForeignKey):
    def get_prefetch_queryset(self, instances, queryset=None):
        """
        Enable passing queryset to get_prefetch_queryset when using GenericForeignKeys but only works when a single
        content type is being queried
        """
        # START CHANGES
        # if queryset is not None:
        #     raise ValueError("Custom queryset can't be used for this lookup.")
        # END CHANGES

        # For efficiency, group the instances by content type and then do one
        # query per model
        fk_dict = defaultdict(set)
        # We need one instance for each group in order to get the right db:
        instance_dict = {}
        ct_attname = self.model._meta.get_field(self.ct_field).get_attname()
        for instance in instances:
            # We avoid looking for values if either ct_id or fkey value is None
            ct_id = getattr(instance, ct_attname)
            if ct_id is not None:
                fk_val = getattr(instance, self.fk_field)
                if fk_val is not None:
                    fk_dict[ct_id].add(fk_val)
                    instance_dict[ct_id] = instance

        ret_val = []
        for ct_id, fkeys in fk_dict.items():
            instance = instance_dict[ct_id]
            # START CHANGES
            if queryset is not None:
                assert len(fk_dict) == 1  # only a single content type is allowed, else undefined behavior
                ret_val.extend(queryset.filter(pk__in=fkeys))
            else:
                ct = self.get_content_type(id=ct_id, using=instance._state.db)
                ret_val.extend(ct.get_all_objects_for_this_type(pk__in=fkeys))
            # END CHANGES

        # For doing the join in Python, we have to match both the FK val and the
        # content type, so we use a callable that returns a (fk, class) pair.
        def gfk_key(obj):
            ct_id = getattr(obj, ct_attname)
            if ct_id is None:
                return None
            else:
                model = self.get_content_type(id=ct_id,
                                              using=obj._state.db).model_class()
                return (model._meta.pk.get_prep_value(getattr(obj, self.fk_field)),
                        model)

        return (
            ret_val,
            lambda obj: (obj.pk, obj.__class__),
            gfk_key,
            True,
            self.name,
            True,
        )