PostgreSQL select 语句到 return 行后 where 条件

Question

每当“where event = 1”指示的事件发生时，我正在查询 return 接下来 7 天的数据价值。目标是然后按用户 ID 对所有数据进行分组，并在事件发生后对该数据执行聚合函数 - 事件被编码为二进制 [0, 1].

到目前为止，我一直在尝试使用嵌套的 select 语句来构建我想要的数据结构，但是使用 window 函数开始限制我。我现在认为自连接可能更合适，但在构建此类查询时需要帮助。

查询当前首先创建按用户和日期分组的每日聚合值（第 3 级嵌套 select）。然后，第2级对数据“value_x”进行求和，得到用户分组的聚合值。然后，第一级嵌套 select 语句使用 lead 函数获取下一行值并由每个用户分区，当事件 = 1 时充当第二天的值 select。最后，select 语句使用聚合函数计算按用户分组且事件 = 1 的平均值“sum_next_day_value_after_event”。放在一起，其中事件 = 1，查询 returns avg(value_x) 下一行的总数 value_x.

然而，这不符合我的时间规则； “where event = 1”，return 事件发生后接下来 7 天的数据。如果没有 7 天的数据，那么 return 任何数据 <= 7 天。是的，我目前只有一个偏移量为 1 的导联，但您可以再添加 6 个这样的函数来获取接下来的 6 行。但是，lead 函数目前只抓取下一行而不考虑日期。所以从理论上讲，下一行的“value_x”实际上可能距离“event = 1”的位置有 15 天。此外，如下面的数据 table 所示，一个用户每天可能有不止一行。

这是我目前的以下查询：

select 
    f.user_id
    avg(f.sum_next_day_value_after_event) as sum_next_day_values
from (
    select
        bld.user_id,
        lead(bld.value_x, 1) over(partition by bld.user_id order by bld.daily) as sum_next_day_value_after_event
    from (
        select 
            l.user_id, 
            l.daily, 
            sum(l.value_x) as sum_daily_value_x
        from (
            select
                user_id, value_x, date_part('day', day_ts) as daily
            from table_1
            group by date_part('day', day_ts), user_id, value_x) l
        group by l.user_id, l.day_ts
        order by l.user_id) bld) f
group by f.user_id

以下是来自 table_1 的数据片段：

user_id	day_ts	value_x	event
50	4/2/21 07:37	25	0
50	4/2/21 07:42	45	0
50	4/2/21 09:14	67	1
50	4/5/21 10:09	8	0
50	4/5/21 10:24	75	0
50	4/8/21 11:08	34	0
50	4/15/21 13:09	32	1
50	4/16/21 14:23	12	0
50	4/29/21 14:34	90	0
55	4/4/21 15:31	12	0
55	4/5/21 15:23	34	0
55	4/17/21 18:58	32	1
55	4/17/21 19:00	66	1
55	4/18/21 19:57	54	0
55	4/23/21 20:02	34	0
55	4/29/21 20:39	57	0
55	4/30/21 21:46	43	0

技术细节：

PostgreSQL，EDB 支持，版本 = 14.1

pgAdmin4，版本 5.7

感谢您的帮助！

Answer 1

"查询当前首先创建每日聚合值"

我在您的第一个查询中没有看到任何聚合函数，因此 GROUP BY 子句没有用。

select
    user_id, value_x, date_part('day', day_ts) as daily
from table_1
group by date_part('day', day_ts), user_id, value_x

可以简化为

select
    user_id, value_x, date_part('day', day_ts) as daily
from table_1

这又没有提供真正的附加值，因此可以删除第一个查询，第二个查询将变为：

select user_id
     , date_part('day', day_ts) as daily
     , sum(value_x) as sum_daily_value_x
 from table_1
group by user_id, date_part('day', day_ts)

也可以在此步骤删除 order by user_id 子句。

现在如果你想计算事件发生后7天内sum_daily_value_x的平均值（我指的是avg() 函数），您可以将 avg() 用作 window 函数，您可以将其限制为事件发生后 7 天的时间段：

select f.user_id
     , avg(f.sum_daily_value_x) over (order by f.daily range between current row and '7 days' following) as sum_next_day_values
  from (
        select user_id
             , date_part('day', day_ts) as daily
             , sum(value_x) as sum_daily_value_x
          from table_1
         group by user_id, date_part('day', day_ts)
       ) AS f
 group by f.user_id

window 函数中的 partition by f.user_id 子句没有用，因为行在 window 之前已经按 f.user_id 分组=] 函数被应用。

您可以将 avg() window 函数替换为任何其他函数，例如 sum() 可以更适合别名 sum_next_day_values

PostgreSQL select 语句到 return 行后 where 条件

PostgreSQL select statement to return rows after where condition

postgresql

self-join

where-clause

window-functions