Django 注释查询仅计算反向关系中的最新数据
Django Annotated Query to Count Only Latest from Reverse Relationship
问题概述
给定模型
class Candidate(BaseModel):
name = models.CharField(max_length=128)
class Status(BaseModel):
name = models.CharField(max_length=128)
class StatusChange(BaseModel):
candidate = models.ForeignKey("Candidate", related_name="status_changes")
status = models.ForeignKey("Status", related_name="status_changes")
created_at = models.DateTimeField(auto_now_add=True, blank=True)
和SQL表:
candidates
+----+--------------+
| id | name |
+----+--------------+
| 1 | Beth |
| 2 | Mark |
| 3 | Mike |
| 4 | Ryan |
+----+--------------+
status
+----+--------------+
| id | name |
+----+--------------+
| 1 | Review |
| 2 | Accepted |
| 3 | Rejected |
+----+--------------+
status_change
+----+--------------+-----------+------------+
| id | candidate_id | status_id | created_at |
+----+--------------+-----------+------------+
| 1 | 1 | 1 | 03-01-2019 |
| 2 | 1 | 2 | 05-01-2019 |
| 4 | 2 | 1 | 01-01-2019 |
| 5 | 3 | 1 | 01-01-2019 |
| 6 | 4 | 3 | 01-01-2019 |
+----+--------------+-----------+------------+
我想获取具有给定状态的候选人总数,但只有最新的status_change被计算在内。
换句话说,StatusChange用于跟踪状态的历史记录,但在统计候选人当前状态时只考虑最新的。
SQL 解决方案
使用 SQL,我能够使用 Group BY 和 COUNT 实现它。
(SQL 未经测试)
SELECT
status.id as status_id
, status.name as status_name
, COUNT(*) as status_count
FROM
(
SELECT
status_id,
Max(created_at) AS latest_status_change
FROM
status_change
GROUP BY status_id
)
AS last_status_count
INNER JOIN
last_status_count AS status
ON (last_status_count.status_id = status.id)
GROUP BY status.name
ORDER BY status_count DESC;
last_status_count
+-----------+-------------+--------+
| status_id | status_name | count |
+-----------+-------------+--------+
| 1 | Review | 2 | # <= Does not include instance from candidate 1
| 2 | Accepted | 1 | # because status 2 is latest
| 3 | Rejected | 1 |
+-----------+-------------+--------+
已尝试 Django 解决方案
我需要查看 return 每个状态及其相应的计数 -
例如 [{ status_name: "Review", count: 2 }, ...]
我不确定如何在不提取所有记录并聚合到 python 的情况下构建此查询集。
我想我需要 annotate()
,可能 Subquery
,但我无法将它们拼接在一起。
我得到的最接近的是这个,它计算每个状态的状态更改次数,但计算非最新更改。
queryset = Status.objects.all().annotate(case_count=Count("status_changes"))
我发现了很多关于聚合的 SO 问题,但我找不到关于聚合和注释“最新”的明确答案。
提前致谢。
我们可以执行一个查询,首先过滤每个 Candidate
的最后 StatusChange
秒,然后计算状态:
from django.db.models import Count, F, Max
Status.objects.filter(
status_changes__in=StatusChange.objects.annotate(
last=Max('candidate__status_changes__created_at')
).filter(
created_at=F('last')
)
).annotate(
nlast=Count('status_changes')
)
对于给定的样本数据,这给了我们:
>>> [(q.name, q.nlast) for q in qs]
[('Review', 2), ('Accepted', 1), ('Rejected', 1)]
问题概述
给定模型
class Candidate(BaseModel):
name = models.CharField(max_length=128)
class Status(BaseModel):
name = models.CharField(max_length=128)
class StatusChange(BaseModel):
candidate = models.ForeignKey("Candidate", related_name="status_changes")
status = models.ForeignKey("Status", related_name="status_changes")
created_at = models.DateTimeField(auto_now_add=True, blank=True)
和SQL表:
candidates
+----+--------------+
| id | name |
+----+--------------+
| 1 | Beth |
| 2 | Mark |
| 3 | Mike |
| 4 | Ryan |
+----+--------------+
status
+----+--------------+
| id | name |
+----+--------------+
| 1 | Review |
| 2 | Accepted |
| 3 | Rejected |
+----+--------------+
status_change
+----+--------------+-----------+------------+
| id | candidate_id | status_id | created_at |
+----+--------------+-----------+------------+
| 1 | 1 | 1 | 03-01-2019 |
| 2 | 1 | 2 | 05-01-2019 |
| 4 | 2 | 1 | 01-01-2019 |
| 5 | 3 | 1 | 01-01-2019 |
| 6 | 4 | 3 | 01-01-2019 |
+----+--------------+-----------+------------+
我想获取具有给定状态的候选人总数,但只有最新的status_change被计算在内。
换句话说,StatusChange用于跟踪状态的历史记录,但在统计候选人当前状态时只考虑最新的。
SQL 解决方案
使用 SQL,我能够使用 Group BY 和 COUNT 实现它。 (SQL 未经测试)
SELECT
status.id as status_id
, status.name as status_name
, COUNT(*) as status_count
FROM
(
SELECT
status_id,
Max(created_at) AS latest_status_change
FROM
status_change
GROUP BY status_id
)
AS last_status_count
INNER JOIN
last_status_count AS status
ON (last_status_count.status_id = status.id)
GROUP BY status.name
ORDER BY status_count DESC;
last_status_count
+-----------+-------------+--------+
| status_id | status_name | count |
+-----------+-------------+--------+
| 1 | Review | 2 | # <= Does not include instance from candidate 1
| 2 | Accepted | 1 | # because status 2 is latest
| 3 | Rejected | 1 |
+-----------+-------------+--------+
已尝试 Django 解决方案
我需要查看 return 每个状态及其相应的计数 -
例如 [{ status_name: "Review", count: 2 }, ...]
我不确定如何在不提取所有记录并聚合到 python 的情况下构建此查询集。
我想我需要 annotate()
,可能 Subquery
,但我无法将它们拼接在一起。
我得到的最接近的是这个,它计算每个状态的状态更改次数,但计算非最新更改。
queryset = Status.objects.all().annotate(case_count=Count("status_changes"))
我发现了很多关于聚合的 SO 问题,但我找不到关于聚合和注释“最新”的明确答案。
提前致谢。
我们可以执行一个查询,首先过滤每个 Candidate
的最后 StatusChange
秒,然后计算状态:
from django.db.models import Count, F, Max
Status.objects.filter(
status_changes__in=StatusChange.objects.annotate(
last=Max('candidate__status_changes__created_at')
).filter(
created_at=F('last')
)
).annotate(
nlast=Count('status_changes')
)
对于给定的样本数据,这给了我们:
>>> [(q.name, q.nlast) for q in qs]
[('Review', 2), ('Accepted', 1), ('Rejected', 1)]