Django:在具有多列的 Django ORM 中计算和分组
Django: Calculate and Group inside Django ORM with multiple columns
美好的一天,
现在我正在努力提高我对 Django ORM 的了解,但正在努力完成以下任务:
但首先,数据库看起来像这样:
class DeathsWorldwide(models.Model):
causes_name = models.CharField(max_length=50, null=False, blank=False)
death_numbers = models.PositiveIntegerField(default=0)
country = models.CharField(max_length=50, null=True, blank=True)
year = models.PositiveIntegerField(null=False, blank=False, validators=[MinValueValidator(1990), MaxValueValidator(2019)])
causes_name | death_numbers | country | year
Alcohol dis. | 25430 | Germany | 1998
Poisoning | 4038 | Germany | 1998
...
Maternal dis. | 9452 | Germany | 1998
Alcohol dis. | 21980 | Germany | 1999
Poisoning | 5117 | Germany | 1999
...
Maternal dis. | 8339 | Germany | 1999
每年,每个国家等的所有疾病总是一个块...年份范围从 1990 年到 2019 年。
我——或者更确切地说,任务——想要实现的是一个所有国家的列表,其中包含计算出的死亡人数,就像这样...
country | death_numbers
France | 78012
Germany | 70510
Austria | 38025
...但有一个附加特征:1990-1999 年每个国家/地区的死亡人数必须从 2000-2019 年的死亡人数中减去。所以完整列表实际上看起来像这样:
country | death_numbers | 19xx | 2xxx
France | 78012 | 36913 | 114925
Germany | 70510 | ... | ...
Austria | 38025 | ... | ...
只用一次查询就可以达到这样的结果吗?
感谢您的帮助,祝您今天愉快!
像下面这样的东西应该可以解决问题。
from django.db.models import Sum, Q
t19 = Sum('death_numbers', filter=Q(year__lt=2000))
t20 = Sum('death_numbers', filter=Q(year__gte=2000))
DeathsWorldwide.objects.values('country').annotate(t19=t19, t20=t20, total=t20 - t19)
它产生以下 SQL:
SELECT
"base_deathsworldwide"."country",
SUM("base_deathsworldwide"."death_numbers")
FILTER (WHERE "base_deathsworldwide"."year" < 2000)
AS "t19",
SUM("base_deathsworldwide"."death_numbers")
FILTER (WHERE "base_deathsworldwide"."year" >= 2000)
AS "t20",
(
SUM("base_deathsworldwide"."death_numbers")
FILTER (WHERE "base_deathsworldwide"."year" >= 2000)
- SUM("base_deathsworldwide"."death_numbers")
FILTER (WHERE "base_deathsworldwide"."year" < 2000)
) AS "total"
FROM "base_deathsworldwide"
GROUP BY "base_deathsworldwide"."country"
是一次查询,重复计算。看起来 ORM 不支持它,但我们可以尝试以最小的努力构建原始 SQL:
from django.db import connection # replace it with connections if using multiple databases
from django.db.models import Sum, Q
t19 = Sum('death_numbers', filter=Q(year__lt=2000))
t20 = Sum('death_numbers', filter=Q(year__gte=2000))
base_query = DeathsWorldwide.objects.values('country').annotate(t19=t19, t20=t20).query
sql, params = base_query.sql_with_params()
template = 'SELECT country, t19, t20, (t20 - t19) AS total FROM ({}) "temp"'
with connection.cursor() as cursor:
data = cursor.execute(template.format(sql), params).fetchall()
print(data)
我建议测量两个选项的性能并进行比较。如果差异很小(或者后者更小,这也是可能的,因为计划优化可能需要更多时间)或者前者对于您的用例来说足够快,那么坚持使用 ORM 解决方案会更好。
美好的一天,
现在我正在努力提高我对 Django ORM 的了解,但正在努力完成以下任务:
但首先,数据库看起来像这样:
class DeathsWorldwide(models.Model):
causes_name = models.CharField(max_length=50, null=False, blank=False)
death_numbers = models.PositiveIntegerField(default=0)
country = models.CharField(max_length=50, null=True, blank=True)
year = models.PositiveIntegerField(null=False, blank=False, validators=[MinValueValidator(1990), MaxValueValidator(2019)])
causes_name | death_numbers | country | year
Alcohol dis. | 25430 | Germany | 1998
Poisoning | 4038 | Germany | 1998
...
Maternal dis. | 9452 | Germany | 1998
Alcohol dis. | 21980 | Germany | 1999
Poisoning | 5117 | Germany | 1999
...
Maternal dis. | 8339 | Germany | 1999
每年,每个国家等的所有疾病总是一个块...年份范围从 1990 年到 2019 年。
我——或者更确切地说,任务——想要实现的是一个所有国家的列表,其中包含计算出的死亡人数,就像这样...
country | death_numbers
France | 78012
Germany | 70510
Austria | 38025
...但有一个附加特征:1990-1999 年每个国家/地区的死亡人数必须从 2000-2019 年的死亡人数中减去。所以完整列表实际上看起来像这样:
country | death_numbers | 19xx | 2xxx
France | 78012 | 36913 | 114925
Germany | 70510 | ... | ...
Austria | 38025 | ... | ...
只用一次查询就可以达到这样的结果吗?
感谢您的帮助,祝您今天愉快!
像下面这样的东西应该可以解决问题。
from django.db.models import Sum, Q
t19 = Sum('death_numbers', filter=Q(year__lt=2000))
t20 = Sum('death_numbers', filter=Q(year__gte=2000))
DeathsWorldwide.objects.values('country').annotate(t19=t19, t20=t20, total=t20 - t19)
它产生以下 SQL:
SELECT
"base_deathsworldwide"."country",
SUM("base_deathsworldwide"."death_numbers")
FILTER (WHERE "base_deathsworldwide"."year" < 2000)
AS "t19",
SUM("base_deathsworldwide"."death_numbers")
FILTER (WHERE "base_deathsworldwide"."year" >= 2000)
AS "t20",
(
SUM("base_deathsworldwide"."death_numbers")
FILTER (WHERE "base_deathsworldwide"."year" >= 2000)
- SUM("base_deathsworldwide"."death_numbers")
FILTER (WHERE "base_deathsworldwide"."year" < 2000)
) AS "total"
FROM "base_deathsworldwide"
GROUP BY "base_deathsworldwide"."country"
是一次查询,重复计算。看起来 ORM 不支持它,但我们可以尝试以最小的努力构建原始 SQL:
from django.db import connection # replace it with connections if using multiple databases
from django.db.models import Sum, Q
t19 = Sum('death_numbers', filter=Q(year__lt=2000))
t20 = Sum('death_numbers', filter=Q(year__gte=2000))
base_query = DeathsWorldwide.objects.values('country').annotate(t19=t19, t20=t20).query
sql, params = base_query.sql_with_params()
template = 'SELECT country, t19, t20, (t20 - t19) AS total FROM ({}) "temp"'
with connection.cursor() as cursor:
data = cursor.execute(template.format(sql), params).fetchall()
print(data)
我建议测量两个选项的性能并进行比较。如果差异很小(或者后者更小,这也是可能的,因为计划优化可能需要更多时间)或者前者对于您的用例来说足够快,那么坚持使用 ORM 解决方案会更好。