Sql 按用户 ID 和日期范围对查询结果进行动态分组

Question

我有一个查询在特定时间范围内从不同的表中收集信息。

目前，我分别为每个用户和每个日期范围发出请求，但我想一次运行所有时间范围，时间范围是 [=32= 之间的每 7 天] ] 和 user_closed_account_at，每个用户都不同。

是否有任何正确的方法可以在一个查询中执行此操作？

示例：https://www.db-fiddle.com/f/aDFuX4qjzCcUmXe8iipaBM/2

我得到的结果：

我想看到的结果：

查询：

SELECT 
    usr.id as user_id,
    usr."onboardedAt" as user_opened_account_at,
    usr."closedAt" as user_closed_account_at,
    '2021-01-01' as start_range_date,
    '2021-01-08' as end_range_date,
    tx.tx_count as tx_count,
    last_user_action.action as last_user_action
FROM "Users" usr

LEFT JOIN (
    SELECT 
        "userId",
        COUNT("id") as "tx_count"
    FROM "Transactions"
    WHERE "createdAt" >= '2021-01-01' AND "createdAt" < '2021-01-08'
    GROUP BY "userId"
) tx ON usr.id = tx."userId"

LEFT JOIN (
    SELECT "userId", "action"
    FROM "UserActions"
    WHERE "createdAt" >= '2021-01-01' AND "createdAt" < '2021-01-08'
    ORDER BY "createdAt" DESC 
    LIMIT 1
) last_user_action ON usr.id = last_user_action."userId"

WHERE usr.id = 1
ORDER BY user_id, start_range_date

架构：

CREATE TABLE "Users" (
    id bigserial PRIMARY KEY,
    "onboardedAt" timestamp with time zone,
    "closedAt" timestamp with time zone
);

CREATE TABLE "Transactions" (
    id bigserial PRIMARY KEY,
    "userId" bigint,
    "createdAt" timestamp with time zone,
    amount numeric(20,8) NOT NULL DEFAULT 0
);

CREATE TABLE "UserActions" (
    id bigserial PRIMARY KEY,
    "userId" bigint,
    "createdAt" timestamp with time zone,
    action character varying(255) NOT NULL
);


INSERT INTO "Users" ("onboardedAt", "closedAt") VALUES 
    ( '2021-01-01', '2021-02-01' ), 
    ( '2021-01-01', '2021-02-01' ), 
    ( '2021-01-01', '2021-02-01' ), 
    ( '2021-02-01', '2021-03-01' ), 
    ( '2021-02-01', '2021-03-01' );

INSERT INTO "Transactions" ("userId", "createdAt", "amount") VALUES 
    ( 1, '2021-01-02',  100 ), 
    ( 1, '2021-01-08', -100 ), 
    ( 1, '2021-01-15', -200 ),
    ( 1, '2021-01-22',  200 ),

    ( 2, '2021-01-02', -100 ), 
    ( 2, '2021-01-02',  100 ), 
    ( 2, '2021-01-15', -200 ),
    ( 2, '2021-01-16',  200 ), 

    ( 3, '2021-01-02',  100 ), 
    ( 3, '2021-01-08', -100 ), 
    ( 3, '2021-01-15', -200 ),
    ( 3, '2021-01-22',  200 ),

    ( 4, '2021-02-02',   50 ), 
    ( 4, '2021-02-08', -100 ), 
    ( 4, '2021-02-15', -200 ),
    ( 4, '2021-02-22',  200 ),

    ( 5, '2021-02-02',  200 ), 
    ( 5, '2021-02-08', -400 ), 
    ( 5, '2021-02-15', -600 ),
    ( 5, '2021-02-22',  200 );

INSERT INTO "UserActions" ("userId", "createdAt", "action") VALUES 
    ( 1, '2021-01-01', 'PLAY' ), 
    ( 1, '2021-01-01', 'PLAY' ), 
    ( 1, '2021-01-02', 'DEPOSIT' ), 
    ( 1, '2021-01-08', 'DEPOSIT' ), 
    ( 1, '2021-01-09', 'PLAY' ), 
    ( 1, '2021-01-15', 'PLAY' ), 
    ( 1, '2021-01-22', 'PLAY' ), 

    ( 2, '2021-01-01', 'PLAY' ), 
    ( 2, '2021-01-01', 'PLAY' ), 
    ( 2, '2021-01-02', 'DEPOSIT' ), 
    ( 2, '2021-01-08', 'DEPOSIT' ), 
    ( 2, '2021-01-09', 'PLAY' ), 
    ( 2, '2021-01-15', 'PLAY' ), 
    ( 2, '2021-01-22', 'PLAY' ), 

    ( 3, '2021-01-01', 'PLAY' ), 
    ( 3, '2021-01-01', 'PLAY' ), 
    ( 3, '2021-01-02', 'DEPOSIT' ), 
    ( 3, '2021-01-08', 'DEPOSIT' ), 
    ( 3, '2021-01-09', 'PLAY' ), 
    ( 3, '2021-01-15', 'PLAY' ), 
    ( 3, '2021-01-22', 'PLAY' ), 

    ( 4, '2021-02-01', 'DEPOSIT' ), 
    ( 4, '2021-02-01', 'PLAY' ), 
    ( 4, '2021-02-02', 'DEPOSIT' ), 
    ( 4, '2021-02-08', 'DEPOSIT' ), 
    ( 4, '2021-02-09', 'PLAY' ), 
    ( 4, '2021-02-15', 'PLAY' ), 
    ( 4, '2021-02-22', 'PLAY' ), 

    ( 5, '2021-02-01', 'DEPOSIT' ), 
    ( 5, '2021-02-01', 'PLAY' ), 
    ( 5, '2021-02-02', 'PLAY' ), 
    ( 5, '2021-02-08', 'PLAY' ), 
    ( 5, '2021-02-09', 'PLAY' ), 
    ( 5, '2021-02-15', 'DEPOSIT' ), 
    ( 5, '2021-02-22', 'PLAY' );

Answer 1

当然可以。您必须使用 LATERAL join 以便您可以在 generate_series() table 表达式中使用左侧 table（用户）的列值，但除此之外，它主要是你所期望的。下面是一些显示重要部分的简化 SQL，如果您想要完整的代码，请添加带有示例数据的 dbfiddle link。

SELECT u.user_id, week_start, count(t.transactions) tx_count
from users AS u
CROSS JOIN LATERAL generate_series(u.onboarded_at, u.account_closed_at, interval '1 week')
    AS week_start
LEFT JOIN transactions AS t
  ON t.created_at >= week_start AND
  AND t.created_at < (week_start + interval '1 week')
GROUP BY 1, 2;

请注意，这仍然主要是一个美化的 for 循环服务器端，但它几乎总是比您的代码中往返于数据库的 for 循环的性能要高得多。

Answer 2

所有周都从星期一开始，这样做（有效）：

SELECT id AS user_id, u."onboardedAt", u."closedAt"
     , week_start, COALESCE(t.tx_count, 0) AS tx_count, a.last_user_action
FROM   "Users" u
CROSS  JOIN  generate_series(date_trunc('week', u."onboardedAt"), u."closedAt", interval '1 week') AS week_start
LEFT   JOIN (
   SELECT "userId" AS id, date_trunc('week', t."createdAt") AS week_start, count(*) AS tx_count
   FROM   "Transactions" t
   GROUP  BY 1, 2
   ) t USING (id, week_start)
LEFT   JOIN (
   SELECT DISTINCT ON (1, 2)
          "userId" AS id, date_trunc('week', a."createdAt") AS week_start, action AS last_user_action
   FROM   "UserActions" a
   ORDER  BY 1, 2, "createdAt" DESC
   ) a USING (id, week_start)
ORDER  BY id, week_start;

db<>fiddle here

使用标准周使一切变得更加简单。我们可以在加入之前在“很多”表中聚合，这样更简单也更便宜。否则，多个连接可能会很快出错。参见：

Two SQL LEFT JOINS produce incorrect result

标准周也可以更轻松地比较数据。（请注意，每个用户的第一周和最后一周可以被截断（跨度更少的天数）。但这在任何情况下都适用于每个用户的最后一周。）

LATERAL 关键字在连接到返回集合的 函数时自动假定 :

CROSS  JOIN  generate_series(...)

参见：

使用 DISTINCT ON 获得每个用户的 last_user_action。参见：

Select first row in each GROUP BY group?

我建议用户使用合法的小写标识符，因此不需要双引号。让您的 Postgres 生活更轻松。

使用最后一个非空动作

在评论中添加：

if action is null in a current week, I want to grab most recent from previous weeks

SELECT user_id, "onboardedAt", "closedAt", week_start, tx_count
     , last_user_action AS last_user_action_with_null
     , COALESCE(last_user_action, max(last_user_action) OVER (PARTITION BY user_id, null_grp)) AS last_user_action
FROM  (
   SELECT id AS user_id, u."onboardedAt", u."closedAt"
        , week_start, COALESCE(t.tx_count, 0) AS tx_count, a.last_user_action
        , count(a.last_user_action) OVER (PARTITION BY id ORDER BY week_start) AS null_grp
   FROM   "Users" u
   CROSS  JOIN  generate_series(date_trunc('week', u."onboardedAt"), u."closedAt", interval '1 week') AS week_start
   LEFT   JOIN (
      SELECT "userId" AS id, date_trunc('week', t."createdAt") AS week_start, count(*) AS tx_count
      FROM   "Transactions" t
      GROUP  BY 1, 2
      ) t  USING (id, week_start)
   LEFT   JOIN (
      SELECT DISTINCT ON (1, 2)
             "userId" AS id, date_trunc('week', a."createdAt") AS week_start, action AS last_user_action
      FROM   "UserActions" a
      ORDER  BY 1, 2, "createdAt" DESC
      ) a USING (id, week_start)
   ) sub
ORDER  BY user_id, week_start;

db<>fiddle here

解释：

Retrieve last known value for each column of a row

Sql 按用户 ID 和日期范围对查询结果进行动态分组

Sql group query results by user id and date ranges dynamically

sql

postgresql

aggregate

greatest-n-per-group

generate-series

使用最后一个非空动作