在保留空行的同时过滤使用 generate_series 的查询

Filtering a query that uses generate_series while keeping null rows

我正在 Django 中执行原始查询,以便利用 PostgreSQL 的 generate_series 函数并获取间隔中每个日期的行,将生成的日期与日期时间范围进行比较(order_dates) 在 BaseEntry 模型中(这在 ORM 中是不可能直接实现的)。

我使用的是非托管模型:

class OrderBinnedStat(models.Model):
    bin = models.DateTimeField()
    id_count = models.IntegerField()
    avg_flow = models.IntegerField()
    sum_flow = models.IntegerField()

    class Meta:
        managed = False

和运行下面的代码查询并查看结果:

from django.utils import timezone


timezone.activate("UTC")

end_dt = timezone.datetime(year=2021, month=12, day=26, hour=0, minute=0, second=0)
end_dt = timezone.make_aware(end_dt)

start_dt = end_dt - timezone.timedelta(days=1)
minutes = 60

report = OrderBinnedStat.objects.raw("""
SELECT row_number() OVER () AS id,
       dt_range.bin,
       count(DISTINCT t.id) AS id_count,
       avg(DISTINCT t.flow) AS avg_flow,
       sum(DISTINCT t.flow) AS sum_flow
FROM (
    SELECT generate_series(%s, %s - interval '1 milliseconds', interval '%s min')
    FROM public.orders_baseentry t -- is this line needed?
    ) dt_range(bin)
LEFT JOIN public.orders_baseentry t ON t.order_dates @> dt_range.bin
-- WHERE t.status = ANY(ARRAY['new', 'cancelled'])
GROUP  BY dt_range.bin
ORDER  BY dt_range.bin;
""", [start_dt, end_dt, minutes])

for item in report:
    print(item.id, item.bin, item.id_count, item.avg_flow, item.sum_flow)

当我不过滤 baseentry table 时效果很好。正如预期的那样,我得到 24 行(请求间隔的每小时一行)。没有 order_dates 与 2021-12-25 21:00:00+00:00 及以后重叠,因此这些行的值都是 0None。完美!

1 2021-12-25 00:00:00+00:00 1 0.65128747160000000000 0.6512874716
2 2021-12-25 01:00:00+00:00 1 0.65128747160000000000 0.6512874716
3 2021-12-25 02:00:00+00:00 1 0.65128747160000000000 0.6512874716
4 2021-12-25 03:00:00+00:00 1 0.65128747160000000000 0.6512874716
5 2021-12-25 04:00:00+00:00 1 0.65128747160000000000 0.6512874716
6 2021-12-25 05:00:00+00:00 1 0.65128747160000000000 0.6512874716
7 2021-12-25 06:00:00+00:00 1 0.65128747160000000000 0.6512874716
8 2021-12-25 07:00:00+00:00 1 0.65128747160000000000 0.6512874716
9 2021-12-25 08:00:00+00:00 1 0.65128747160000000000 0.6512874716
10 2021-12-25 09:00:00+00:00 1 0.65128747160000000000 0.6512874716
11 2021-12-25 10:00:00+00:00 1 0.65128747160000000000 0.6512874716
12 2021-12-25 11:00:00+00:00 1 0.65128747160000000000 0.6512874716
13 2021-12-25 12:00:00+00:00 1 0.65128747160000000000 0.6512874716
14 2021-12-25 13:00:00+00:00 1 0.65128747160000000000 0.6512874716
15 2021-12-25 14:00:00+00:00 1 0.65128747160000000000 0.6512874716
16 2021-12-25 15:00:00+00:00 1 0.65128747160000000000 0.6512874716
17 2021-12-25 16:00:00+00:00 1 0.65128747160000000000 0.6512874716
18 2021-12-25 17:00:00+00:00 1 0.65128747160000000000 0.6512874716
19 2021-12-25 18:00:00+00:00 1 0.65128747160000000000 0.6512874716
20 2021-12-25 19:00:00+00:00 1 0.65128747160000000000 0.6512874716
21 2021-12-25 20:00:00+00:00 1 0.65128747160000000000 0.6512874716
22 2021-12-25 21:00:00+00:00 0 None None
23 2021-12-25 22:00:00+00:00 0 None None
24 2021-12-25 23:00:00+00:00 0 None None

但是如果我取消注释 WHERE 子句以开始过滤(或添加任何其他过滤),我会得到以下结果:

1 2021-12-25 00:00:00+00:00 1 0.65128747160000000000 0.6512874716
2 2021-12-25 01:00:00+00:00 1 0.65128747160000000000 0.6512874716
3 2021-12-25 02:00:00+00:00 1 0.65128747160000000000 0.6512874716
4 2021-12-25 03:00:00+00:00 1 0.65128747160000000000 0.6512874716
5 2021-12-25 04:00:00+00:00 1 0.65128747160000000000 0.6512874716
6 2021-12-25 05:00:00+00:00 1 0.65128747160000000000 0.6512874716
7 2021-12-25 06:00:00+00:00 1 0.65128747160000000000 0.6512874716
8 2021-12-25 07:00:00+00:00 1 0.65128747160000000000 0.6512874716
9 2021-12-25 08:00:00+00:00 1 0.65128747160000000000 0.6512874716
10 2021-12-25 09:00:00+00:00 1 0.65128747160000000000 0.6512874716
11 2021-12-25 10:00:00+00:00 1 0.65128747160000000000 0.6512874716
12 2021-12-25 11:00:00+00:00 1 0.65128747160000000000 0.6512874716
13 2021-12-25 12:00:00+00:00 1 0.65128747160000000000 0.6512874716
14 2021-12-25 13:00:00+00:00 1 0.65128747160000000000 0.6512874716
15 2021-12-25 14:00:00+00:00 1 0.65128747160000000000 0.6512874716
16 2021-12-25 15:00:00+00:00 1 0.65128747160000000000 0.6512874716
17 2021-12-25 16:00:00+00:00 1 0.65128747160000000000 0.6512874716
18 2021-12-25 17:00:00+00:00 1 0.65128747160000000000 0.6512874716
19 2021-12-25 18:00:00+00:00 1 0.65128747160000000000 0.6512874716
20 2021-12-25 19:00:00+00:00 1 0.65128747160000000000 0.6512874716
21 2021-12-25 20:00:00+00:00 1 0.65128747160000000000 0.6512874716

我需要能够过滤 baseentry table,同时保留空行。

WHERE t.status = ANY(ARRAY['new', 'cancelled']) 替换为 AND t.status = ANY(ARRAY['new', 'cancelled']),它可以正常工作!