根据重叠的活动时间间隔对 SQL 行进行分组,有效期自和有效期至
Grouping SQL rows based on overlapping active time intervals, valid from and valid to
我正在使用这个模拟数据在 bigquery 中工作:
create schema if not exists dbo;
create table if not exists dbo.player_history(team_id INT, player_id INT, active_from TIMESTAMP, active_to TIMESTAMP);
truncate table dbo.player_history;
INSERT INTO dbo.player_history VALUES(1,1,'2020-01-01', '2020-01-08');
INSERT INTO dbo.player_history VALUES(1,2,'2020-06-01', '2020-09-08');
INSERT INTO dbo.player_history VALUES(1,3,'2020-06-10', '2020-10-01');
INSERT INTO dbo.player_history VALUES(1,4,'2020-02-01', '2020-02-15');
INSERT INTO dbo.player_history VALUES(1,5,'2021-01-01', '2021-01-08');
INSERT INTO dbo.player_history VALUES(1,6,'2021-01-02', '2021-06-08');
INSERT INTO dbo.player_history VALUES(1,7,'2021-01-03', '2021-06-08');
INSERT INTO dbo.player_history VALUES(1,8,'2021-01-04', '2021-06-08');
INSERT INTO dbo.player_history VALUES(1,9,'2020-01-02', '2021-02-05');
INSERT INTO dbo.player_history VALUES(1,10,'2020-10-01', '2021-04-08');
INSERT INTO dbo.player_history VALUES(1,11,'2020-11-01', '2021-05-08');
select *
from dbo.player_history
order by 3, 4
而我想说的是现役阵容。输出看起来像这样:
其背后的逻辑是:
我几乎用 valid_to 和 valid_from 之间的某种引导 (valid_from) 破解了它,并且如果它是一个新的阵容,那么在什么时候让它成为 1 0 否则,然后对其进行某种累加求和以获得 ID,但我无法 100% 解决它...我非常绝望,不知道该去哪里找了。
**更正:阵容 4 和 5 实际上应该只是一个阵容。
鉴于我们在评论部分讨论过一个玩家可以属于多个阵容,您可以尝试使用以下方法 JOIN
:
WITH LINEUPS AS
(SELECT a.*,b.player_id as b_player_id
FROM `dbo.player_history` a
INNER JOIN `dbo.player_history` b on b.active_from BETWEEN a.active_from AND a.active_to
ORDER BY 3, 4)
SELECT
team_id,
ROW_NUMBER () OVER (PARTITION BY team_id ORDER BY active_from, active_to) AS lineup_id,
active_from,
active_to,
ARRAY_AGG(DISTINCT b_player_id) as player_ids
FROM LINEUPS
GROUP BY team_id, active_from, active_to
ORDER BY active_from, active_to
由于输出太长,我无法在 Bigquery 控制台中通过屏幕截图向您展示,因此我将结果提取到 Google 张。请参阅下面的输出屏幕截图:
我正在使用这个模拟数据在 bigquery 中工作:
create schema if not exists dbo;
create table if not exists dbo.player_history(team_id INT, player_id INT, active_from TIMESTAMP, active_to TIMESTAMP);
truncate table dbo.player_history;
INSERT INTO dbo.player_history VALUES(1,1,'2020-01-01', '2020-01-08');
INSERT INTO dbo.player_history VALUES(1,2,'2020-06-01', '2020-09-08');
INSERT INTO dbo.player_history VALUES(1,3,'2020-06-10', '2020-10-01');
INSERT INTO dbo.player_history VALUES(1,4,'2020-02-01', '2020-02-15');
INSERT INTO dbo.player_history VALUES(1,5,'2021-01-01', '2021-01-08');
INSERT INTO dbo.player_history VALUES(1,6,'2021-01-02', '2021-06-08');
INSERT INTO dbo.player_history VALUES(1,7,'2021-01-03', '2021-06-08');
INSERT INTO dbo.player_history VALUES(1,8,'2021-01-04', '2021-06-08');
INSERT INTO dbo.player_history VALUES(1,9,'2020-01-02', '2021-02-05');
INSERT INTO dbo.player_history VALUES(1,10,'2020-10-01', '2021-04-08');
INSERT INTO dbo.player_history VALUES(1,11,'2020-11-01', '2021-05-08');
select *
from dbo.player_history
order by 3, 4
而我想说的是现役阵容。输出看起来像这样:
其背后的逻辑是:
我几乎用 valid_to 和 valid_from 之间的某种引导 (valid_from) 破解了它,并且如果它是一个新的阵容,那么在什么时候让它成为 1 0 否则,然后对其进行某种累加求和以获得 ID,但我无法 100% 解决它...我非常绝望,不知道该去哪里找了。
**更正:阵容 4 和 5 实际上应该只是一个阵容。
鉴于我们在评论部分讨论过一个玩家可以属于多个阵容,您可以尝试使用以下方法 JOIN
:
WITH LINEUPS AS
(SELECT a.*,b.player_id as b_player_id
FROM `dbo.player_history` a
INNER JOIN `dbo.player_history` b on b.active_from BETWEEN a.active_from AND a.active_to
ORDER BY 3, 4)
SELECT
team_id,
ROW_NUMBER () OVER (PARTITION BY team_id ORDER BY active_from, active_to) AS lineup_id,
active_from,
active_to,
ARRAY_AGG(DISTINCT b_player_id) as player_ids
FROM LINEUPS
GROUP BY team_id, active_from, active_to
ORDER BY active_from, active_to
由于输出太长,我无法在 Bigquery 控制台中通过屏幕截图向您展示,因此我将结果提取到 Google 张。请参阅下面的输出屏幕截图: