使用子查询注释计数

Question

请帮帮我，我已经坚持了太久了:(

我想做的事情：

我有这两个型号：

class Specialization(models.Model):
    name = models.CharField("name", max_length=64)
class Doctor(models.Model):
    name = models.CharField("name", max_length=128)
    # ...
    specialization = models.ForeignKey(Specialization)

我想用拥有该专业的医生的数量来注释查询集中的所有专业。

我目前的解决方案：

我经历了一个循环并做了一个简单的：Doctor.objects.filter(specialization=spec).count() 然而事实证明这太慢且效率低下。我读得越多，就越意识到在这里使用 SubQuery 来筛选 OuterRef 专业化的医生是有意义的。这是我想出的：

doctors = Doctor.objects.all().filter(specialization=OuterRef("id")) \
    .values("specialization_id") \
    .order_by()
add_doctors_count = doctors.annotate(cnt=Count("specialization_id")).values("cnt")[:1]

spec_qs_with_counts = Specialization.objects.all().annotate(
    num_applicable_doctors=Subquery(add_doctors_count, output_field=IntegerField())
)

我得到的输出是每个专业只有 1。该代码只是用 specialization_id 注释每个医生对象，然后注释该组内的计数，这意味着它将是 1.

不幸的是，这对我来说并不完全有意义。在我最初的尝试中，我使用了一个聚合来进行计数，虽然它自己工作，但它不能作为 SubQuery，我得到这个错误：

This queryset contains a reference to an outer query and may only be used in a subquery.

我之前发过这个问题，有人建议做 Specialization.objects.annotate(count=Count("doctor"))

但是这不起作用，因为我需要计算一个特定的医生查询集。

我已点击这些链接

但是，我没有得到相同的结果：

如果您有任何问题可以更清楚地说明这一点，请告诉我。

Answer 1

计数全部 `Doctor`s per `Specialization`

我认为你把事情弄得太复杂了，可能是因为你认为 Count('doctor') 会计算 每个医生 每个专业（不管那个医生的专业）。它不会，如果你 Count 这样的相关对象，Django 会隐式地寻找相关对象。事实上你根本不能 Count('unrelated_model')，它只能通过像 ForeignKey、ManyToManyField 等关系（反向包括），你可以查询这些，否则这些不是很感性.

I would like to annotate all specializations in a queryset with the number of doctors that have this specialization.

你可以用一个简单的方法来做到这一点：

#  Counting all doctors <i>per</i> specialization (so not <i>all</i> doctors in general)

from django.db.models import <b>Count</b>

Specialization.objects.annotate(
    <b>num_doctors=Count('doctor')</b>
)

现在 this 查询集中的每个 Specialization 对象都会有一个额外的属性 num_doctors，它是一个整数（具有该专业的医生的数量）。

您还可以在同一查询中筛选 Specialization（例如，仅获取以 'my' 结尾的专业化）。只要您不过滤相关的 doctor 集，Count 就会起作用（请参阅下面的部分如何执行此操作）。

如果您过滤相关 doctor，则相关计数将过滤掉这些医生。此外，如果您过滤 另一个 相关对象，那么这将导致额外的 JOIN，它将充当乘数 Count秒。在这种情况下，最好改用 num_doctors=Count('doctor', distinct=True) 。您始终可以使用 distinct=True（无论您是否执行额外的 JOIN），但它会对性能产生很小的影响。

上面的工作是因为 Count('doctor') 不只是将 all 医生添加到查询中，它在 doctor 上创建了 LEFT OUTER JOIN table，从而检查 Doctor 的 specialization_id 是否正是我们要找的那个。所以 Django 将构建的查询如下所示：

SELECT specialization.*
       COUNT(doctor.id) AS num_doctors
FROM specialization
LEFT OUTER JOIN doctor <b>ON doctor.specialization_id = specialization.id</b>
GROUP BY specialization.id

对子查询执行相同的操作在功能上会得到相同的结果，但如果 Django ORM 和数据库管理系统没有找到优化方法，这可能会导致查询开销很大，因为对于每个专业化，然后它会在数据库中产生一个额外的子查询。

每 `Specialization`

计数具体 Doctor 秒

但是如果你想统计只名以 Joe 开头的医生，那么你可以 在相关 doctor 上添加过滤器 ，例如：

#  counting all Doctors with as name Joe per specialization

from django.db.models import Count

Specialization.objects.filter(
    <b>doctor__name__startswith='Joe'</b>  # sample filter
).annotate(
    num_doctors=Count('doctor')
)

Answer 2

问题

问题是 Django 一看到使用聚合函数就添加 GROUP BY。

解决方案

因此您可以创建自己的聚合函数，但 Django 认为它不是聚合函数。就像这样：

doctors = Doctor.objects.filter(
    specialization=OuterRef("id")
).order_by().annotate(
    count=Func(F('id'), function='Count')
).values('count')

spec_qs_with_counts = Specialization.objects.annotate(
    num_applicable_doctors=Subquery(doctors)
)

有关此方法的更多详细信息，请参阅此答案：

也可以在有关 using aggregates within a subquery expression and func expressions 的文档中找到有用的信息。

使用子查询注释计数

Using Subquery to annotate a Count

python

django

django-aggregation

django-annotate

django-subquery

我想做的事情：

我目前的解决方案：

我已点击这些链接

计数全部 `Doctor`s per `Specialization`

每 `Specialization`

问题

解决方案

使用子查询注释计数

Using Subquery to annotate a Count

python

django

django-aggregation

django-annotate

django-subquery

我想做的事情：

我目前的解决方案：

我已点击这些链接

计数 全部 Doctors per Specialization

每 Specialization

问题

解决方案

计数全部 `Doctor`s per `Specialization`

每 `Specialization`