Sql 按用户 ID 和日期范围对查询结果进行动态分组
Sql group query results by user id and date ranges dynamically
我有一个查询在特定时间范围内从不同的表中收集信息。
目前,我分别为每个用户和每个日期范围发出请求,但我想一次 运行 所有时间范围,时间范围是 [=32= 之间的每 7 天] ] 和 user_closed_account_at,每个用户都不同。
是否有任何正确的方法可以在一个查询中执行此操作?
示例:https://www.db-fiddle.com/f/aDFuX4qjzCcUmXe8iipaBM/2
我得到的结果:
我想看到的结果:
查询:
SELECT
usr.id as user_id,
usr."onboardedAt" as user_opened_account_at,
usr."closedAt" as user_closed_account_at,
'2021-01-01' as start_range_date,
'2021-01-08' as end_range_date,
tx.tx_count as tx_count,
last_user_action.action as last_user_action
FROM "Users" usr
LEFT JOIN (
SELECT
"userId",
COUNT("id") as "tx_count"
FROM "Transactions"
WHERE "createdAt" >= '2021-01-01' AND "createdAt" < '2021-01-08'
GROUP BY "userId"
) tx ON usr.id = tx."userId"
LEFT JOIN (
SELECT "userId", "action"
FROM "UserActions"
WHERE "createdAt" >= '2021-01-01' AND "createdAt" < '2021-01-08'
ORDER BY "createdAt" DESC
LIMIT 1
) last_user_action ON usr.id = last_user_action."userId"
WHERE usr.id = 1
ORDER BY user_id, start_range_date
架构:
CREATE TABLE "Users" (
id bigserial PRIMARY KEY,
"onboardedAt" timestamp with time zone,
"closedAt" timestamp with time zone
);
CREATE TABLE "Transactions" (
id bigserial PRIMARY KEY,
"userId" bigint,
"createdAt" timestamp with time zone,
amount numeric(20,8) NOT NULL DEFAULT 0
);
CREATE TABLE "UserActions" (
id bigserial PRIMARY KEY,
"userId" bigint,
"createdAt" timestamp with time zone,
action character varying(255) NOT NULL
);
INSERT INTO "Users" ("onboardedAt", "closedAt") VALUES
( '2021-01-01', '2021-02-01' ),
( '2021-01-01', '2021-02-01' ),
( '2021-01-01', '2021-02-01' ),
( '2021-02-01', '2021-03-01' ),
( '2021-02-01', '2021-03-01' );
INSERT INTO "Transactions" ("userId", "createdAt", "amount") VALUES
( 1, '2021-01-02', 100 ),
( 1, '2021-01-08', -100 ),
( 1, '2021-01-15', -200 ),
( 1, '2021-01-22', 200 ),
( 2, '2021-01-02', -100 ),
( 2, '2021-01-02', 100 ),
( 2, '2021-01-15', -200 ),
( 2, '2021-01-16', 200 ),
( 3, '2021-01-02', 100 ),
( 3, '2021-01-08', -100 ),
( 3, '2021-01-15', -200 ),
( 3, '2021-01-22', 200 ),
( 4, '2021-02-02', 50 ),
( 4, '2021-02-08', -100 ),
( 4, '2021-02-15', -200 ),
( 4, '2021-02-22', 200 ),
( 5, '2021-02-02', 200 ),
( 5, '2021-02-08', -400 ),
( 5, '2021-02-15', -600 ),
( 5, '2021-02-22', 200 );
INSERT INTO "UserActions" ("userId", "createdAt", "action") VALUES
( 1, '2021-01-01', 'PLAY' ),
( 1, '2021-01-01', 'PLAY' ),
( 1, '2021-01-02', 'DEPOSIT' ),
( 1, '2021-01-08', 'DEPOSIT' ),
( 1, '2021-01-09', 'PLAY' ),
( 1, '2021-01-15', 'PLAY' ),
( 1, '2021-01-22', 'PLAY' ),
( 2, '2021-01-01', 'PLAY' ),
( 2, '2021-01-01', 'PLAY' ),
( 2, '2021-01-02', 'DEPOSIT' ),
( 2, '2021-01-08', 'DEPOSIT' ),
( 2, '2021-01-09', 'PLAY' ),
( 2, '2021-01-15', 'PLAY' ),
( 2, '2021-01-22', 'PLAY' ),
( 3, '2021-01-01', 'PLAY' ),
( 3, '2021-01-01', 'PLAY' ),
( 3, '2021-01-02', 'DEPOSIT' ),
( 3, '2021-01-08', 'DEPOSIT' ),
( 3, '2021-01-09', 'PLAY' ),
( 3, '2021-01-15', 'PLAY' ),
( 3, '2021-01-22', 'PLAY' ),
( 4, '2021-02-01', 'DEPOSIT' ),
( 4, '2021-02-01', 'PLAY' ),
( 4, '2021-02-02', 'DEPOSIT' ),
( 4, '2021-02-08', 'DEPOSIT' ),
( 4, '2021-02-09', 'PLAY' ),
( 4, '2021-02-15', 'PLAY' ),
( 4, '2021-02-22', 'PLAY' ),
( 5, '2021-02-01', 'DEPOSIT' ),
( 5, '2021-02-01', 'PLAY' ),
( 5, '2021-02-02', 'PLAY' ),
( 5, '2021-02-08', 'PLAY' ),
( 5, '2021-02-09', 'PLAY' ),
( 5, '2021-02-15', 'DEPOSIT' ),
( 5, '2021-02-22', 'PLAY' );
当然可以。您必须使用 LATERAL join 以便您可以在 generate_series() table 表达式中使用左侧 table(用户)的列值,但除此之外,它主要是你所期望的。下面是一些显示重要部分的简化 SQL,如果您想要完整的代码,请添加带有示例数据的 dbfiddle link。
SELECT u.user_id, week_start, count(t.transactions) tx_count
from users AS u
CROSS JOIN LATERAL generate_series(u.onboarded_at, u.account_closed_at, interval '1 week')
AS week_start
LEFT JOIN transactions AS t
ON t.created_at >= week_start AND
AND t.created_at < (week_start + interval '1 week')
GROUP BY 1, 2;
请注意,这仍然主要是一个美化的 for 循环服务器端,但它几乎总是比您的代码中往返于数据库的 for 循环的性能要高得多。
所有周都从星期一开始,这样做(有效):
SELECT id AS user_id, u."onboardedAt", u."closedAt"
, week_start, COALESCE(t.tx_count, 0) AS tx_count, a.last_user_action
FROM "Users" u
CROSS JOIN generate_series(date_trunc('week', u."onboardedAt"), u."closedAt", interval '1 week') AS week_start
LEFT JOIN (
SELECT "userId" AS id, date_trunc('week', t."createdAt") AS week_start, count(*) AS tx_count
FROM "Transactions" t
GROUP BY 1, 2
) t USING (id, week_start)
LEFT JOIN (
SELECT DISTINCT ON (1, 2)
"userId" AS id, date_trunc('week', a."createdAt") AS week_start, action AS last_user_action
FROM "UserActions" a
ORDER BY 1, 2, "createdAt" DESC
) a USING (id, week_start)
ORDER BY id, week_start;
db<>fiddle here
使用标准周使一切变得更加简单。我们可以在加入之前在“很多”表中聚合,这样更简单也更便宜。否则,多个连接可能会很快出错。参见:
- Two SQL LEFT JOINS produce incorrect result
标准周也可以更轻松地比较数据。 (请注意,每个用户的第一周和最后一周可以被截断(跨度更少的天数)。但这在任何情况下都适用于每个用户的最后一周。)
LATERAL
关键字在连接到返回集合的 函数时自动假定 :
CROSS JOIN generate_series(...)
参见:
使用 DISTINCT ON
获得每个用户的 last_user_action
。参见:
- Select first row in each GROUP BY group?
我建议用户使用合法的小写标识符,因此不需要双引号。让您的 Postgres 生活更轻松。
使用最后一个非空动作
在评论中添加:
if action is null in a current week, I want to grab most recent from previous weeks
SELECT user_id, "onboardedAt", "closedAt", week_start, tx_count
, last_user_action AS last_user_action_with_null
, COALESCE(last_user_action, max(last_user_action) OVER (PARTITION BY user_id, null_grp)) AS last_user_action
FROM (
SELECT id AS user_id, u."onboardedAt", u."closedAt"
, week_start, COALESCE(t.tx_count, 0) AS tx_count, a.last_user_action
, count(a.last_user_action) OVER (PARTITION BY id ORDER BY week_start) AS null_grp
FROM "Users" u
CROSS JOIN generate_series(date_trunc('week', u."onboardedAt"), u."closedAt", interval '1 week') AS week_start
LEFT JOIN (
SELECT "userId" AS id, date_trunc('week', t."createdAt") AS week_start, count(*) AS tx_count
FROM "Transactions" t
GROUP BY 1, 2
) t USING (id, week_start)
LEFT JOIN (
SELECT DISTINCT ON (1, 2)
"userId" AS id, date_trunc('week', a."createdAt") AS week_start, action AS last_user_action
FROM "UserActions" a
ORDER BY 1, 2, "createdAt" DESC
) a USING (id, week_start)
) sub
ORDER BY user_id, week_start;
db<>fiddle here
解释:
- Retrieve last known value for each column of a row
我有一个查询在特定时间范围内从不同的表中收集信息。
目前,我分别为每个用户和每个日期范围发出请求,但我想一次 运行 所有时间范围,时间范围是 [=32= 之间的每 7 天] ] 和 user_closed_account_at,每个用户都不同。
是否有任何正确的方法可以在一个查询中执行此操作?
示例:https://www.db-fiddle.com/f/aDFuX4qjzCcUmXe8iipaBM/2
我得到的结果:
我想看到的结果:
查询:
SELECT
usr.id as user_id,
usr."onboardedAt" as user_opened_account_at,
usr."closedAt" as user_closed_account_at,
'2021-01-01' as start_range_date,
'2021-01-08' as end_range_date,
tx.tx_count as tx_count,
last_user_action.action as last_user_action
FROM "Users" usr
LEFT JOIN (
SELECT
"userId",
COUNT("id") as "tx_count"
FROM "Transactions"
WHERE "createdAt" >= '2021-01-01' AND "createdAt" < '2021-01-08'
GROUP BY "userId"
) tx ON usr.id = tx."userId"
LEFT JOIN (
SELECT "userId", "action"
FROM "UserActions"
WHERE "createdAt" >= '2021-01-01' AND "createdAt" < '2021-01-08'
ORDER BY "createdAt" DESC
LIMIT 1
) last_user_action ON usr.id = last_user_action."userId"
WHERE usr.id = 1
ORDER BY user_id, start_range_date
架构:
CREATE TABLE "Users" (
id bigserial PRIMARY KEY,
"onboardedAt" timestamp with time zone,
"closedAt" timestamp with time zone
);
CREATE TABLE "Transactions" (
id bigserial PRIMARY KEY,
"userId" bigint,
"createdAt" timestamp with time zone,
amount numeric(20,8) NOT NULL DEFAULT 0
);
CREATE TABLE "UserActions" (
id bigserial PRIMARY KEY,
"userId" bigint,
"createdAt" timestamp with time zone,
action character varying(255) NOT NULL
);
INSERT INTO "Users" ("onboardedAt", "closedAt") VALUES
( '2021-01-01', '2021-02-01' ),
( '2021-01-01', '2021-02-01' ),
( '2021-01-01', '2021-02-01' ),
( '2021-02-01', '2021-03-01' ),
( '2021-02-01', '2021-03-01' );
INSERT INTO "Transactions" ("userId", "createdAt", "amount") VALUES
( 1, '2021-01-02', 100 ),
( 1, '2021-01-08', -100 ),
( 1, '2021-01-15', -200 ),
( 1, '2021-01-22', 200 ),
( 2, '2021-01-02', -100 ),
( 2, '2021-01-02', 100 ),
( 2, '2021-01-15', -200 ),
( 2, '2021-01-16', 200 ),
( 3, '2021-01-02', 100 ),
( 3, '2021-01-08', -100 ),
( 3, '2021-01-15', -200 ),
( 3, '2021-01-22', 200 ),
( 4, '2021-02-02', 50 ),
( 4, '2021-02-08', -100 ),
( 4, '2021-02-15', -200 ),
( 4, '2021-02-22', 200 ),
( 5, '2021-02-02', 200 ),
( 5, '2021-02-08', -400 ),
( 5, '2021-02-15', -600 ),
( 5, '2021-02-22', 200 );
INSERT INTO "UserActions" ("userId", "createdAt", "action") VALUES
( 1, '2021-01-01', 'PLAY' ),
( 1, '2021-01-01', 'PLAY' ),
( 1, '2021-01-02', 'DEPOSIT' ),
( 1, '2021-01-08', 'DEPOSIT' ),
( 1, '2021-01-09', 'PLAY' ),
( 1, '2021-01-15', 'PLAY' ),
( 1, '2021-01-22', 'PLAY' ),
( 2, '2021-01-01', 'PLAY' ),
( 2, '2021-01-01', 'PLAY' ),
( 2, '2021-01-02', 'DEPOSIT' ),
( 2, '2021-01-08', 'DEPOSIT' ),
( 2, '2021-01-09', 'PLAY' ),
( 2, '2021-01-15', 'PLAY' ),
( 2, '2021-01-22', 'PLAY' ),
( 3, '2021-01-01', 'PLAY' ),
( 3, '2021-01-01', 'PLAY' ),
( 3, '2021-01-02', 'DEPOSIT' ),
( 3, '2021-01-08', 'DEPOSIT' ),
( 3, '2021-01-09', 'PLAY' ),
( 3, '2021-01-15', 'PLAY' ),
( 3, '2021-01-22', 'PLAY' ),
( 4, '2021-02-01', 'DEPOSIT' ),
( 4, '2021-02-01', 'PLAY' ),
( 4, '2021-02-02', 'DEPOSIT' ),
( 4, '2021-02-08', 'DEPOSIT' ),
( 4, '2021-02-09', 'PLAY' ),
( 4, '2021-02-15', 'PLAY' ),
( 4, '2021-02-22', 'PLAY' ),
( 5, '2021-02-01', 'DEPOSIT' ),
( 5, '2021-02-01', 'PLAY' ),
( 5, '2021-02-02', 'PLAY' ),
( 5, '2021-02-08', 'PLAY' ),
( 5, '2021-02-09', 'PLAY' ),
( 5, '2021-02-15', 'DEPOSIT' ),
( 5, '2021-02-22', 'PLAY' );
当然可以。您必须使用 LATERAL join 以便您可以在 generate_series() table 表达式中使用左侧 table(用户)的列值,但除此之外,它主要是你所期望的。下面是一些显示重要部分的简化 SQL,如果您想要完整的代码,请添加带有示例数据的 dbfiddle link。
SELECT u.user_id, week_start, count(t.transactions) tx_count
from users AS u
CROSS JOIN LATERAL generate_series(u.onboarded_at, u.account_closed_at, interval '1 week')
AS week_start
LEFT JOIN transactions AS t
ON t.created_at >= week_start AND
AND t.created_at < (week_start + interval '1 week')
GROUP BY 1, 2;
请注意,这仍然主要是一个美化的 for 循环服务器端,但它几乎总是比您的代码中往返于数据库的 for 循环的性能要高得多。
所有周都从星期一开始,这样做(有效):
SELECT id AS user_id, u."onboardedAt", u."closedAt"
, week_start, COALESCE(t.tx_count, 0) AS tx_count, a.last_user_action
FROM "Users" u
CROSS JOIN generate_series(date_trunc('week', u."onboardedAt"), u."closedAt", interval '1 week') AS week_start
LEFT JOIN (
SELECT "userId" AS id, date_trunc('week', t."createdAt") AS week_start, count(*) AS tx_count
FROM "Transactions" t
GROUP BY 1, 2
) t USING (id, week_start)
LEFT JOIN (
SELECT DISTINCT ON (1, 2)
"userId" AS id, date_trunc('week', a."createdAt") AS week_start, action AS last_user_action
FROM "UserActions" a
ORDER BY 1, 2, "createdAt" DESC
) a USING (id, week_start)
ORDER BY id, week_start;
db<>fiddle here
使用标准周使一切变得更加简单。我们可以在加入之前在“很多”表中聚合,这样更简单也更便宜。否则,多个连接可能会很快出错。参见:
- Two SQL LEFT JOINS produce incorrect result
标准周也可以更轻松地比较数据。 (请注意,每个用户的第一周和最后一周可以被截断(跨度更少的天数)。但这在任何情况下都适用于每个用户的最后一周。)
LATERAL
关键字在连接到返回集合的 函数时自动假定 :
CROSS JOIN generate_series(...)
参见:
使用 DISTINCT ON
获得每个用户的 last_user_action
。参见:
- Select first row in each GROUP BY group?
我建议用户使用合法的小写标识符,因此不需要双引号。让您的 Postgres 生活更轻松。
使用最后一个非空动作
在评论中添加:
if action is null in a current week, I want to grab most recent from previous weeks
SELECT user_id, "onboardedAt", "closedAt", week_start, tx_count
, last_user_action AS last_user_action_with_null
, COALESCE(last_user_action, max(last_user_action) OVER (PARTITION BY user_id, null_grp)) AS last_user_action
FROM (
SELECT id AS user_id, u."onboardedAt", u."closedAt"
, week_start, COALESCE(t.tx_count, 0) AS tx_count, a.last_user_action
, count(a.last_user_action) OVER (PARTITION BY id ORDER BY week_start) AS null_grp
FROM "Users" u
CROSS JOIN generate_series(date_trunc('week', u."onboardedAt"), u."closedAt", interval '1 week') AS week_start
LEFT JOIN (
SELECT "userId" AS id, date_trunc('week', t."createdAt") AS week_start, count(*) AS tx_count
FROM "Transactions" t
GROUP BY 1, 2
) t USING (id, week_start)
LEFT JOIN (
SELECT DISTINCT ON (1, 2)
"userId" AS id, date_trunc('week', a."createdAt") AS week_start, action AS last_user_action
FROM "UserActions" a
ORDER BY 1, 2, "createdAt" DESC
) a USING (id, week_start)
) sub
ORDER BY user_id, week_start;
db<>fiddle here
解释:
- Retrieve last known value for each column of a row