Django:在具有多列的 Django ORM 中计算和分组

Django: Calculate and Group inside Django ORM with multiple columns

美好的一天,

现在我正在努力提高我对 Django ORM 的了解,但正在努力完成以下任务:

但首先,数据库看起来像这样:

class DeathsWorldwide(models.Model):
    causes_name = models.CharField(max_length=50, null=False, blank=False)
    death_numbers = models.PositiveIntegerField(default=0)
    country = models.CharField(max_length=50, null=True, blank=True)
    year = models.PositiveIntegerField(null=False, blank=False, validators=[MinValueValidator(1990), MaxValueValidator(2019)])


causes_name    |    death_numbers    |    country    |    year
Alcohol dis.   |    25430            |    Germany    |    1998
Poisoning      |    4038             |    Germany    |    1998
...
Maternal dis.  |    9452             |    Germany    |    1998
Alcohol dis.   |    21980            |    Germany    |    1999
Poisoning      |    5117             |    Germany    |    1999
...
Maternal dis.  |    8339             |    Germany    |    1999

每年,每个国家等的所有疾病总是一个块...年份范围从 1990 年到 2019 年。

我——或者更确切地说,任务——想要实现的是一个所有国家的列表,其中包含计算出的死亡人数,就像这样...

country    |    death_numbers
France     |    78012
Germany    |    70510
Austria    |    38025

...但有一个附加特征:1990-1999 年每个国家/地区的死亡人数必须从 2000-2019 年的死亡人数中减去。所以完整列表实际上看起来像这样:

country    |    death_numbers    |    19xx    |    2xxx
France     |    78012            |    36913   |    114925
Germany    |    70510            |    ...     |    ...
Austria    |    38025            |    ...     |    ...

只用一次查询就可以达到这样的结果吗?

感谢您的帮助,祝您今天愉快!

像下面这样的东西应该可以解决问题。

from django.db.models import Sum, Q

t19 = Sum('death_numbers', filter=Q(year__lt=2000))
t20 = Sum('death_numbers', filter=Q(year__gte=2000))
DeathsWorldwide.objects.values('country').annotate(t19=t19, t20=t20, total=t20 - t19)

它产生以下 SQL:

SELECT 
    "base_deathsworldwide"."country", 
    SUM("base_deathsworldwide"."death_numbers") 
        FILTER (WHERE "base_deathsworldwide"."year" < 2000)
        AS "t19", 
    SUM("base_deathsworldwide"."death_numbers") 
        FILTER (WHERE "base_deathsworldwide"."year" >= 2000)
        AS "t20", 
    (
        SUM("base_deathsworldwide"."death_numbers") 
            FILTER (WHERE "base_deathsworldwide"."year" >= 2000) 
        - SUM("base_deathsworldwide"."death_numbers") 
            FILTER (WHERE "base_deathsworldwide"."year" < 2000)
    ) AS "total" 
FROM "base_deathsworldwide" 
GROUP BY "base_deathsworldwide"."country"

是一次查询,重复计算。看起来 ORM 不支持它,但我们可以尝试以最小的努力构建原始 SQL:

from django.db import connection  # replace it with connections if using multiple databases
from django.db.models import Sum, Q

t19 = Sum('death_numbers', filter=Q(year__lt=2000))
t20 = Sum('death_numbers', filter=Q(year__gte=2000))
base_query = DeathsWorldwide.objects.values('country').annotate(t19=t19, t20=t20).query
sql, params = base_query.sql_with_params()
template = 'SELECT country, t19, t20, (t20 - t19) AS total FROM ({}) "temp"'

with connection.cursor() as cursor:
    data = cursor.execute(template.format(sql), params).fetchall()

print(data)

我建议测量两个选项的性能并进行比较。如果差异很小(或者后者更小,这也是可能的,因为计划优化可能需要更多时间)或者前者对于您的用例来说足够快,那么坚持使用 ORM 解决方案会更好。