Sql 按用户 ID 和日期范围对查询结果进行动态分组

Sql group query results by user id and date ranges dynamically

我有一个查询在特定时间范围内从不同的表中收集信息。

目前,我分别为每个用户和每个日期范围发出请求,但我想一次 运行 所有时间范围,时间范围是 [=32= 之间的每 7 天] ] 和 user_closed_account_at,每个用户都不同。

是否有任何正确的方法可以在一个查询中执行此操作?

示例:https://www.db-fiddle.com/f/aDFuX4qjzCcUmXe8iipaBM/2

我得到的结果:

我想看到的结果:

查询:

SELECT 
    usr.id as user_id,
    usr."onboardedAt" as user_opened_account_at,
    usr."closedAt" as user_closed_account_at,
    '2021-01-01' as start_range_date,
    '2021-01-08' as end_range_date,
    tx.tx_count as tx_count,
    last_user_action.action as last_user_action
FROM "Users" usr

LEFT JOIN (
    SELECT 
        "userId",
        COUNT("id") as "tx_count"
    FROM "Transactions"
    WHERE "createdAt" >= '2021-01-01' AND "createdAt" < '2021-01-08'
    GROUP BY "userId"
) tx ON usr.id = tx."userId"

LEFT JOIN (
    SELECT "userId", "action"
    FROM "UserActions"
    WHERE "createdAt" >= '2021-01-01' AND "createdAt" < '2021-01-08'
    ORDER BY "createdAt" DESC 
    LIMIT 1
) last_user_action ON usr.id = last_user_action."userId"

WHERE usr.id = 1
ORDER BY user_id, start_range_date

架构:

CREATE TABLE "Users" (
    id bigserial PRIMARY KEY,
    "onboardedAt" timestamp with time zone,
    "closedAt" timestamp with time zone
);

CREATE TABLE "Transactions" (
    id bigserial PRIMARY KEY,
    "userId" bigint,
    "createdAt" timestamp with time zone,
    amount numeric(20,8) NOT NULL DEFAULT 0
);

CREATE TABLE "UserActions" (
    id bigserial PRIMARY KEY,
    "userId" bigint,
    "createdAt" timestamp with time zone,
    action character varying(255) NOT NULL
);


INSERT INTO "Users" ("onboardedAt", "closedAt") VALUES 
    ( '2021-01-01', '2021-02-01' ), 
    ( '2021-01-01', '2021-02-01' ), 
    ( '2021-01-01', '2021-02-01' ), 
    ( '2021-02-01', '2021-03-01' ), 
    ( '2021-02-01', '2021-03-01' );

INSERT INTO "Transactions" ("userId", "createdAt", "amount") VALUES 
    ( 1, '2021-01-02',  100 ), 
    ( 1, '2021-01-08', -100 ), 
    ( 1, '2021-01-15', -200 ),
    ( 1, '2021-01-22',  200 ),

    ( 2, '2021-01-02', -100 ), 
    ( 2, '2021-01-02',  100 ), 
    ( 2, '2021-01-15', -200 ),
    ( 2, '2021-01-16',  200 ), 

    ( 3, '2021-01-02',  100 ), 
    ( 3, '2021-01-08', -100 ), 
    ( 3, '2021-01-15', -200 ),
    ( 3, '2021-01-22',  200 ),

    ( 4, '2021-02-02',   50 ), 
    ( 4, '2021-02-08', -100 ), 
    ( 4, '2021-02-15', -200 ),
    ( 4, '2021-02-22',  200 ),

    ( 5, '2021-02-02',  200 ), 
    ( 5, '2021-02-08', -400 ), 
    ( 5, '2021-02-15', -600 ),
    ( 5, '2021-02-22',  200 );

INSERT INTO "UserActions" ("userId", "createdAt", "action") VALUES 
    ( 1, '2021-01-01', 'PLAY' ), 
    ( 1, '2021-01-01', 'PLAY' ), 
    ( 1, '2021-01-02', 'DEPOSIT' ), 
    ( 1, '2021-01-08', 'DEPOSIT' ), 
    ( 1, '2021-01-09', 'PLAY' ), 
    ( 1, '2021-01-15', 'PLAY' ), 
    ( 1, '2021-01-22', 'PLAY' ), 

    ( 2, '2021-01-01', 'PLAY' ), 
    ( 2, '2021-01-01', 'PLAY' ), 
    ( 2, '2021-01-02', 'DEPOSIT' ), 
    ( 2, '2021-01-08', 'DEPOSIT' ), 
    ( 2, '2021-01-09', 'PLAY' ), 
    ( 2, '2021-01-15', 'PLAY' ), 
    ( 2, '2021-01-22', 'PLAY' ), 

    ( 3, '2021-01-01', 'PLAY' ), 
    ( 3, '2021-01-01', 'PLAY' ), 
    ( 3, '2021-01-02', 'DEPOSIT' ), 
    ( 3, '2021-01-08', 'DEPOSIT' ), 
    ( 3, '2021-01-09', 'PLAY' ), 
    ( 3, '2021-01-15', 'PLAY' ), 
    ( 3, '2021-01-22', 'PLAY' ), 

    ( 4, '2021-02-01', 'DEPOSIT' ), 
    ( 4, '2021-02-01', 'PLAY' ), 
    ( 4, '2021-02-02', 'DEPOSIT' ), 
    ( 4, '2021-02-08', 'DEPOSIT' ), 
    ( 4, '2021-02-09', 'PLAY' ), 
    ( 4, '2021-02-15', 'PLAY' ), 
    ( 4, '2021-02-22', 'PLAY' ), 

    ( 5, '2021-02-01', 'DEPOSIT' ), 
    ( 5, '2021-02-01', 'PLAY' ), 
    ( 5, '2021-02-02', 'PLAY' ), 
    ( 5, '2021-02-08', 'PLAY' ), 
    ( 5, '2021-02-09', 'PLAY' ), 
    ( 5, '2021-02-15', 'DEPOSIT' ), 
    ( 5, '2021-02-22', 'PLAY' );

当然可以。您必须使用 LATERAL join 以便您可以在 generate_series() table 表达式中使用左侧 table(用户)的列值,但除此之外,它主要是你所期望的。下面是一些显示重要部分的简化 SQL,如果您想要完整的代码,请添加带有示例数据的 dbfiddle link。

SELECT u.user_id, week_start, count(t.transactions) tx_count
from users AS u
CROSS JOIN LATERAL generate_series(u.onboarded_at, u.account_closed_at, interval '1 week')
    AS week_start
LEFT JOIN transactions AS t
  ON t.created_at >= week_start AND
  AND t.created_at < (week_start + interval '1 week')
GROUP BY 1, 2;

请注意,这仍然主要是一个美化的 for 循环服务器端,但它几乎总是比您的代码中往返于数据库的 for 循环的性能要高得多。

所有周都从星期一开始,这样做(有效):

SELECT id AS user_id, u."onboardedAt", u."closedAt"
     , week_start, COALESCE(t.tx_count, 0) AS tx_count, a.last_user_action
FROM   "Users" u
CROSS  JOIN  generate_series(date_trunc('week', u."onboardedAt"), u."closedAt", interval '1 week') AS week_start
LEFT   JOIN (
   SELECT "userId" AS id, date_trunc('week', t."createdAt") AS week_start, count(*) AS tx_count
   FROM   "Transactions" t
   GROUP  BY 1, 2
   ) t USING (id, week_start)
LEFT   JOIN (
   SELECT DISTINCT ON (1, 2)
          "userId" AS id, date_trunc('week', a."createdAt") AS week_start, action AS last_user_action
   FROM   "UserActions" a
   ORDER  BY 1, 2, "createdAt" DESC
   ) a USING (id, week_start)
ORDER  BY id, week_start;

db<>fiddle here

使用标准周使一切变得更加简单。我们可以在加入之前在“很多”表中聚合,这样更简单也更便宜。否则,多个连接可能会很快出错。参见:

  • Two SQL LEFT JOINS produce incorrect result

标准周也可以更轻松地比较数据。 (请注意,每个用户的第一周和最后一周可以被截断(跨度更少的天数)。但这在任何情况下都适用于每个用户的最后一周。)

LATERAL 关键字在连接到返回集合的 函数时自动假定 :

CROSS  JOIN  generate_series(...)

参见:

使用 DISTINCT ON 获得每个用户的 last_user_action。参见:

  • Select first row in each GROUP BY group?

我建议用户使用合法的小写标识符,因此不需要双引号。让您的 Postgres 生活更轻松。

使用最后一个非空动作

在评论中添加:

if action is null in a current week, I want to grab most recent from previous weeks

SELECT user_id, "onboardedAt", "closedAt", week_start, tx_count
     , last_user_action AS last_user_action_with_null
     , COALESCE(last_user_action, max(last_user_action) OVER (PARTITION BY user_id, null_grp)) AS last_user_action
FROM  (
   SELECT id AS user_id, u."onboardedAt", u."closedAt"
        , week_start, COALESCE(t.tx_count, 0) AS tx_count, a.last_user_action
        , count(a.last_user_action) OVER (PARTITION BY id ORDER BY week_start) AS null_grp
   FROM   "Users" u
   CROSS  JOIN  generate_series(date_trunc('week', u."onboardedAt"), u."closedAt", interval '1 week') AS week_start
   LEFT   JOIN (
      SELECT "userId" AS id, date_trunc('week', t."createdAt") AS week_start, count(*) AS tx_count
      FROM   "Transactions" t
      GROUP  BY 1, 2
      ) t  USING (id, week_start)
   LEFT   JOIN (
      SELECT DISTINCT ON (1, 2)
             "userId" AS id, date_trunc('week', a."createdAt") AS week_start, action AS last_user_action
      FROM   "UserActions" a
      ORDER  BY 1, 2, "createdAt" DESC
      ) a USING (id, week_start)
   ) sub
ORDER  BY user_id, week_start;

db<>fiddle here

解释:

  • Retrieve last known value for each column of a row