Django 条件子查询聚合
Django conditional Subquery aggregate
我的模型结构的一个简化示例是
class Corporation(models.Model):
...
class Division(models.Model):
corporation = models.ForeignKey(Corporation)
class Department(models.Model):
division = models.ForeignKey(Division)
type = models.IntegerField()
现在我想显示一个 table 来显示公司,其中一列将包含特定类型的部门数量,例如type=10
。目前,这是通过 Corporation
模型上的帮助程序实现的,该模型检索这些内容,例如
class Corporation(models.Model):
...
def get_departments_type_10(self):
return (
Department.objects
.filter(division__corporation=self, type=10)
.count()
)
这里的问题是,由于 N+1 问题,这绝对会扼杀性能。
我尝试用 select_related
、prefetch_related
、annotate
和 subquery
来解决这个问题,但我一直无法得到我想要的结果需要。
理想情况下,查询集中的每个 Corporation
都应该用一个整数 type_10_count
进行注释,它反映了该类型的部门数量。
我确定我可以在 .extra()
中使用原始 sql 做一些事情,但文档宣布它将被弃用(我在 Django 1.11 上)
编辑:原始 sql 解决方案示例
corps = Corporation.objects.raw("""
SELECT
*,
(
SELECT COUNT(*)
FROM foo_division div ON div.corporation_id = c.id
JOIN foo_department dept ON dept.division_id = div.id
WHERE dept.type = 10
) as type_10_count
FROM foo_corporation c
""")
您应该可以使用 Case()
表达式来查询具有您要查找的类型的部门的数量:
from django.db.models import Case, IntegerField, Sum, When, Value
Corporation.objects.annotate(
type_10_count=Sum(
Case(
When(division__department__type=10, then=Value(1)),
default=Value(0),
output_field=IntegerField()
)
)
)
我认为使用 Subquery
我们可以得到 SQL 类似于您提供的代码
# Get amount of departments with GROUP BY division__corporation [1]
# .order_by() will remove any ordering so we won't get additional GROUP BY columns [2]
departments = Department.objects.filter(type=10).values(
'division__corporation'
).annotate(count=Count('id')).order_by()
# Attach departments as Subquery to Corporation by Corporation.id.
# Departments are already grouped by division__corporation
# so .values('count') will always return single row with single column - count [3]
departments_subquery = departments.filter(division__corporation=OuterRef('id'))
corporations = Corporation.objects.annotate(
departments_of_type_10=Subquery(
departments_subquery.values('count'), output_field=IntegerField()
)
)
生成的SQL是
SELECT "corporation"."id", ... (other fields) ...,
(
SELECT COUNT("division"."id") AS "count"
FROM "department"
INNER JOIN "division" ON ("department"."division_id" = "division"."id")
WHERE (
"department"."type" = 10 AND
"division"."corporation_id" = ("corporation"."id")
) GROUP BY "division"."corporation_id"
) AS "departments_of_type_10"
FROM "corporation"
这里的一些问题是子查询对于大表可能会很慢。然而,数据库查询优化器可以足够聪明,将子查询提升为 OUTER JOIN,至少我听说 PostgreSQL 这样做。
我喜欢以下方式:
departments = Department.objects.filter(
type=10,
division__corporation=OuterRef('id')
).annotate(
count=Func('id', 'Count')
).values('count').order_by()
corporations = Corporation.objects.annotate(
departments_of_type_10=Subquery(depatments)
)
有关此方法的更多详细信息,请参阅此答案:
我的模型结构的一个简化示例是
class Corporation(models.Model):
...
class Division(models.Model):
corporation = models.ForeignKey(Corporation)
class Department(models.Model):
division = models.ForeignKey(Division)
type = models.IntegerField()
现在我想显示一个 table 来显示公司,其中一列将包含特定类型的部门数量,例如type=10
。目前,这是通过 Corporation
模型上的帮助程序实现的,该模型检索这些内容,例如
class Corporation(models.Model):
...
def get_departments_type_10(self):
return (
Department.objects
.filter(division__corporation=self, type=10)
.count()
)
这里的问题是,由于 N+1 问题,这绝对会扼杀性能。
我尝试用 select_related
、prefetch_related
、annotate
和 subquery
来解决这个问题,但我一直无法得到我想要的结果需要。
理想情况下,查询集中的每个 Corporation
都应该用一个整数 type_10_count
进行注释,它反映了该类型的部门数量。
我确定我可以在 .extra()
中使用原始 sql 做一些事情,但文档宣布它将被弃用(我在 Django 1.11 上)
编辑:原始 sql 解决方案示例
corps = Corporation.objects.raw("""
SELECT
*,
(
SELECT COUNT(*)
FROM foo_division div ON div.corporation_id = c.id
JOIN foo_department dept ON dept.division_id = div.id
WHERE dept.type = 10
) as type_10_count
FROM foo_corporation c
""")
您应该可以使用 Case()
表达式来查询具有您要查找的类型的部门的数量:
from django.db.models import Case, IntegerField, Sum, When, Value
Corporation.objects.annotate(
type_10_count=Sum(
Case(
When(division__department__type=10, then=Value(1)),
default=Value(0),
output_field=IntegerField()
)
)
)
我认为使用 Subquery
我们可以得到 SQL 类似于您提供的代码
# Get amount of departments with GROUP BY division__corporation [1]
# .order_by() will remove any ordering so we won't get additional GROUP BY columns [2]
departments = Department.objects.filter(type=10).values(
'division__corporation'
).annotate(count=Count('id')).order_by()
# Attach departments as Subquery to Corporation by Corporation.id.
# Departments are already grouped by division__corporation
# so .values('count') will always return single row with single column - count [3]
departments_subquery = departments.filter(division__corporation=OuterRef('id'))
corporations = Corporation.objects.annotate(
departments_of_type_10=Subquery(
departments_subquery.values('count'), output_field=IntegerField()
)
)
生成的SQL是
SELECT "corporation"."id", ... (other fields) ...,
(
SELECT COUNT("division"."id") AS "count"
FROM "department"
INNER JOIN "division" ON ("department"."division_id" = "division"."id")
WHERE (
"department"."type" = 10 AND
"division"."corporation_id" = ("corporation"."id")
) GROUP BY "division"."corporation_id"
) AS "departments_of_type_10"
FROM "corporation"
这里的一些问题是子查询对于大表可能会很慢。然而,数据库查询优化器可以足够聪明,将子查询提升为 OUTER JOIN,至少我听说 PostgreSQL 这样做。
我喜欢以下方式:
departments = Department.objects.filter(
type=10,
division__corporation=OuterRef('id')
).annotate(
count=Func('id', 'Count')
).values('count').order_by()
corporations = Corporation.objects.annotate(
departments_of_type_10=Subquery(depatments)
)
有关此方法的更多详细信息,请参阅此答案: