Django 条件子查询聚合

Question

我的模型结构的一个简化示例是

class Corporation(models.Model):
    ...

class Division(models.Model):
    corporation = models.ForeignKey(Corporation)

class Department(models.Model):
    division = models.ForeignKey(Division)
    type = models.IntegerField()

现在我想显示一个 table 来显示公司，其中一列将包含特定类型的部门数量，例如type=10。目前，这是通过 Corporation 模型上的帮助程序实现的，该模型检索这些内容，例如

class Corporation(models.Model):
    ...
    def get_departments_type_10(self):
        return (
            Department.objects
            .filter(division__corporation=self, type=10)
            .count()
        )

这里的问题是，由于 N+1 问题，这绝对会扼杀性能。

我尝试用 select_related、prefetch_related、annotate 和 subquery 来解决这个问题，但我一直无法得到我想要的结果需要。

理想情况下，查询集中的每个 Corporation 都应该用一个整数 type_10_count 进行注释，它反映了该类型的部门数量。

我确定我可以在 .extra() 中使用原始 sql 做一些事情，但文档宣布它将被弃用（我在 Django 1.11 上）

编辑：原始 sql 解决方案示例

corps = Corporation.objects.raw("""
SELECT
*,
(
    SELECT COUNT(*)
    FROM foo_division div ON div.corporation_id = c.id
    JOIN foo_department dept ON dept.division_id = div.id
    WHERE dept.type = 10
) as type_10_count
FROM foo_corporation c
""")

Answer 1

您应该可以使用 Case() 表达式来查询具有您要查找的类型的部门的数量：

from django.db.models import Case, IntegerField, Sum, When, Value

Corporation.objects.annotate(
    type_10_count=Sum(
        Case(
            When(division__department__type=10, then=Value(1)),
            default=Value(0),
            output_field=IntegerField()
        )
    )
)

Answer 2

我认为使用 Subquery 我们可以得到 SQL 类似于您提供的代码

# Get amount of departments with GROUP BY division__corporation [1]
# .order_by() will remove any ordering so we won't get additional GROUP BY columns [2]
departments = Department.objects.filter(type=10).values(
    'division__corporation'
).annotate(count=Count('id')).order_by()

# Attach departments as Subquery to Corporation by Corporation.id.
# Departments are already grouped by division__corporation
# so .values('count') will always return single row with single column - count [3]
departments_subquery = departments.filter(division__corporation=OuterRef('id'))
corporations = Corporation.objects.annotate(
    departments_of_type_10=Subquery(
        departments_subquery.values('count'), output_field=IntegerField()
    )
)

生成的SQL是

SELECT "corporation"."id", ... (other fields) ...,
  (
    SELECT COUNT("division"."id") AS "count"
    FROM "department"
    INNER JOIN "division" ON ("department"."division_id" = "division"."id") 
    WHERE (
      "department"."type" = 10 AND
      "division"."corporation_id" = ("corporation"."id")
    ) GROUP BY "division"."corporation_id"
  ) AS "departments_of_type_10"
FROM "corporation"

这里的一些问题是子查询对于大表可能会很慢。然而，数据库查询优化器可以足够聪明，将子查询提升为 OUTER JOIN，至少我听说 PostgreSQL 这样做。

1. GROUP BY using .values and .annotate

2. order_by() problems

3. Subquery

Answer 3

我喜欢以下方式：

departments = Department.objects.filter(
    type=10,
    division__corporation=OuterRef('id')
).annotate(
    count=Func('id', 'Count')
).values('count').order_by()

corporations = Corporation.objects.annotate(
    departments_of_type_10=Subquery(depatments)
)

有关此方法的更多详细信息，请参阅此答案：

Django 条件子查询聚合

Django conditional Subquery aggregate

django

django-annotate

django-subquery