Django：在查询集更新（）中使用带注释的聚合

Question

我运行在添加到现有项目的新应用程序中遇到了一个有趣的情况。我的目标是（使用 Celery 任务）使用包含来自外键对象的注释聚合值的值一次更新许多行。以下是我在之前的问题中使用的一些示例模型：

class Book(models.model):
    author = models.CharField()
    num_pages = models.IntegerField()
    num_chapters = models.IntegerField()

class UserBookRead(models.Model):
    user = models.ForeignKey(settings.AUTH_USER_MODEL)
    user_book_stats = models.ForeignKey(UserBookStats)
    book = models.ForeignKey(Book)
    complete = models.BooleanField(default=False)
    pages_read = models.IntegerField()

class UserBookStats(models.Model):
    user = models.ForeignKey(settings.AUTH_USER_MODEL)
    total_pages_read = models.IntegerField()

我正在尝试：

当 Book 页计数更新时，使用来自 Book 个实例的 post_save 信号更新相关 UserBookRead 对象的 pages_read。
在信号结束时，启动后台 Celery 任务以从每个已更新的 UserBookRead 中汇总 pages_read，并更新每个相关 [=] 的 total_pages_read 22=]（这就是问题所在）

我正在尝试尽可能精简查询数量 - 第 1 步已完成，只需要对我的实际用例进行一些查询，这对于信号处理程序来说似乎是可以接受的，只要那些查询已正确优化。

第 2 步涉及更多，因此委派给后台任务。我已经设法以一种相当干净的方式完成了其中的大部分（好吧，至少对我而言）。

我运行遇到的问题是，当使用 total_pages 聚合注释 UserBookStats 查询集时（所有 pages_read 的 Sum() 相关 UserBookRead 对象），我不能用查询集的直接 update 来设置 total_pages_read 字段。

这是代码（Book 实例作为 book 传递给任务）：

# use the provided book instance to get the stats which need to be updated
book_read_objects= UserBookRead.objects.filter(book=book)
book_stat_objects = UserBookStats.objects.filter(id__in=book_read_objects.values_list('user_book_stats__id', flat=True).distinct())

# annotate top level stats objects with summed page count
book_stat_objects = book_stat_objects.annotate(total_pages=Sum(F('user_book_read__pages_read')))

# update the objects with that sum
book_stat_objects.update(total_pages_read=F('total_pages'))

执行最后一行时，抛出此错误：

django.core.exceptions.FieldError: Aggregate functions are not allowed in this query

经过一些研究，我找到了这个用例的现有 Django 票证 here，最后一条评论提到了 1.11 中的 2 个新功能，可以使它成为可能。

有没有 known/accepted 方法来完成这个用例，也许使用 Subquery 或 OuterRef？我没有成功尝试将聚合折叠为 Subquery。这里的回退是：

for obj in book_stat_objects:
    obj.total_pages_read = obj.total_pages
    obj.save()

但是 book_stat_objects 中可能有数万条记录，我真的试图避免为每个单独发布更新。

Answer 1

我最终弄清楚了如何使用 Subquery 和 OuterRef 执行此操作，但不得不采用与我最初预期不同的方法。

我能够快速获得 Subquery 工作，但是当我用它来注释父查询时，我注意到每个注释值都是 first 结果子查询的 - 这是当我意识到我需要 OuterRef 时，因为生成的 SQL 没有通过父查询中的任何内容限制子查询。

This part of the Django docs was super helpful, as was Whosebug 问题。这个过程归结为你必须使用 Subquery 来创建聚合，并使用 OuterRef 来确保子查询通过父查询 PK 来限制聚合行。那时，您可以使用聚合值进行注释并直接在查询集中使用它 update().

正如我在问题中提到的，代码示例是编造的。我尝试通过更改使它们适应我的实际用例：

from django.db.models import Subquery, OuterRef
from django.db.models.functions import Coalesce

# create the queryset to use as the subquery, restrict based on the `book_stat_objects` queryset
book_reads = UserBookRead.objects.filter(user_book_stat__in=book_stat_objects, user_book_stats=OuterRef('pk')).values('user_book_stats')
# annotate the future subquery with the aggregation of pages_read from each UserBookRead
total_pages = book_reads.annotate(total=Sum(F('pages_read')))
# annotate each stat object with the subquery total
book_stats = book_stats.annotate(total=Coalesce(Subquery(total_pages), 0))
# update each row with the new total pages count
book_stats.update(total_pages_read=F('total'))

创建一个不能单独使用的查询集感觉很奇怪（尝试评估 book_reads 会因包含 OuterRef 而引发错误），但是一旦你检查了最终的SQL 为 book_stats 生成，这是有道理的。

编辑

在找到这个答案后的一两周内，我运行陷入了此代码的错误。原来是由于 UserBookRead 模型的默认值 ordering。作为 Django docs 状态，默认 ordering 被合并到任何聚合 GROUP BY 子句中，因此我的所有聚合都已关闭。解决方案是在创建基本子查询时用空白 order_by() 清除默认顺序：

book_reads = UserBookRead.objects.filter(user_book_stat__in=book_stat_objects, user_book_stats=OuterRef('pk')).values('user_book_stats').order_by()

Django：在查询集更新（）中使用带注释的聚合

Django: using an annotated aggregate in queryset update()

django

django-orm