添加具有滚动计算分组依据的列
Add column with rolling calculation group by
我有一个 table 这样的:
current_date
user_id
mode_name
mode_time
2021-10-01
1
game
10
2021-10-02
1
game
10
2021-10-02
1
tv
30
2021-10-09
1
music
10
2021-10-15
1
music
40
2021-10-01
2
music
10
2021-10-01
2
game
10
2021-10-04
2
game
10
2021-10-04
2
music
20
2021-10-05
2
tv
40
2021-10-11
2
tv
40
2021-10-12
2
game
20
我想添加两列:
- 最喜欢的列
mode_name
,根据每个 user_id
的 mode_time
列的累计总和
- 每个
user_id
收藏夹 mode_name
中 mode_time
列的累计总和的列
所需的 table 应如下所示:
current_date
user_id
mode_name
mode_time
favourite_mode
favourite_mode_time
2021-10-01
1
game
10
game
10
2021-10-02
1
game
10
tv
30
2021-10-02
1
tv
30
tv
30
2021-10-09
1
music
10
tv
30
2021-10-15
1
music
40
music
50
2021-10-01
2
music
10
game
10
2021-10-01
2
game
10
game
10
2021-10-04
2
game
10
music
30
2021-10-04
2
music
20
music
30
2021-10-05
2
tv
40
tv
40
2021-10-11
2
tv
40
tv
80
2021-10-12
2
game
20
tv
80
Table 可以在这里找到 https://dbfiddle.uk/?rdbms=sqlserver_2019&fiddle=e05302a2cfd81a2a55de811e294f513e
您可以使用 max
与用户和模式分区来计算模式的滚动总和,然后使用 max
和 max_by
在外部 select 中获取相应的值:
-- sample data
WITH dataset (date, user_id, mode_name, mode_time) AS (
values ('2021-10-01', 1, 'game', 10),
('2021-10-02', 1, 'game', 10),
('2021-10-02', 1, 'tv', 30),
('2021-10-09', 1, 'music', 10),
('2021-10-15', 1, 'music', 40),
('2021-10-01', 2, 'game', 10),
('2021-10-01', 2, 'music', 10),
('2021-10-04', 2, 'game', 10),
('2021-10-04', 2, 'music', 20),
('2021-10-05', 2, 'tv', 40),
('2021-10-11', 2, 'tv', 40),
('2021-10-12', 2, 'game', 20)
)
--query
SELECT date, user_id, mode_name, mode_time,
max_by(mode_name, mode_time_rolling_time) OVER (
PARTITION BY user_id
ORDER BY date
) AS favourite_mode,
max(mode_time_rolling_time) OVER (
PARTITION BY user_id
ORDER BY date
) AS favourite_mode_time
FROM(
SELECT *,
sum(mode_time) OVER (
PARTITION BY user_id,
mode_name
ORDER BY date
) AS mode_time_rolling_time
FROM dataset
)
ORDER BY user_id, date
输出:
date
user_id
mode_name
mode_time
favourite_mode
favourite_mode_time
2021-10-01
1
game
10
game
10
2021-10-02
1
game
10
tv
30
2021-10-02
1
tv
30
tv
30
2021-10-09
1
music
10
tv
30
2021-10-15
1
music
40
music
50
2021-10-01
2
game
10
game
10
2021-10-01
2
music
10
game
10
2021-10-04
2
music
20
music
30
2021-10-04
2
game
10
music
30
2021-10-05
2
tv
40
tv
40
2021-10-11
2
tv
40
tv
80
2021-10-12
2
game
20
tv
80
我有一个 table 这样的:
current_date | user_id | mode_name | mode_time |
---|---|---|---|
2021-10-01 | 1 | game | 10 |
2021-10-02 | 1 | game | 10 |
2021-10-02 | 1 | tv | 30 |
2021-10-09 | 1 | music | 10 |
2021-10-15 | 1 | music | 40 |
2021-10-01 | 2 | music | 10 |
2021-10-01 | 2 | game | 10 |
2021-10-04 | 2 | game | 10 |
2021-10-04 | 2 | music | 20 |
2021-10-05 | 2 | tv | 40 |
2021-10-11 | 2 | tv | 40 |
2021-10-12 | 2 | game | 20 |
我想添加两列:
- 最喜欢的列
mode_name
,根据每个user_id
的 - 每个
user_id
收藏夹
mode_time
列的累计总和
mode_name
中 mode_time
列的累计总和的列
所需的 table 应如下所示:
current_date | user_id | mode_name | mode_time | favourite_mode | favourite_mode_time |
---|---|---|---|---|---|
2021-10-01 | 1 | game | 10 | game | 10 |
2021-10-02 | 1 | game | 10 | tv | 30 |
2021-10-02 | 1 | tv | 30 | tv | 30 |
2021-10-09 | 1 | music | 10 | tv | 30 |
2021-10-15 | 1 | music | 40 | music | 50 |
2021-10-01 | 2 | music | 10 | game | 10 |
2021-10-01 | 2 | game | 10 | game | 10 |
2021-10-04 | 2 | game | 10 | music | 30 |
2021-10-04 | 2 | music | 20 | music | 30 |
2021-10-05 | 2 | tv | 40 | tv | 40 |
2021-10-11 | 2 | tv | 40 | tv | 80 |
2021-10-12 | 2 | game | 20 | tv | 80 |
Table 可以在这里找到 https://dbfiddle.uk/?rdbms=sqlserver_2019&fiddle=e05302a2cfd81a2a55de811e294f513e
您可以使用 max
与用户和模式分区来计算模式的滚动总和,然后使用 max
和 max_by
在外部 select 中获取相应的值:
-- sample data
WITH dataset (date, user_id, mode_name, mode_time) AS (
values ('2021-10-01', 1, 'game', 10),
('2021-10-02', 1, 'game', 10),
('2021-10-02', 1, 'tv', 30),
('2021-10-09', 1, 'music', 10),
('2021-10-15', 1, 'music', 40),
('2021-10-01', 2, 'game', 10),
('2021-10-01', 2, 'music', 10),
('2021-10-04', 2, 'game', 10),
('2021-10-04', 2, 'music', 20),
('2021-10-05', 2, 'tv', 40),
('2021-10-11', 2, 'tv', 40),
('2021-10-12', 2, 'game', 20)
)
--query
SELECT date, user_id, mode_name, mode_time,
max_by(mode_name, mode_time_rolling_time) OVER (
PARTITION BY user_id
ORDER BY date
) AS favourite_mode,
max(mode_time_rolling_time) OVER (
PARTITION BY user_id
ORDER BY date
) AS favourite_mode_time
FROM(
SELECT *,
sum(mode_time) OVER (
PARTITION BY user_id,
mode_name
ORDER BY date
) AS mode_time_rolling_time
FROM dataset
)
ORDER BY user_id, date
输出:
date | user_id | mode_name | mode_time | favourite_mode | favourite_mode_time |
---|---|---|---|---|---|
2021-10-01 | 1 | game | 10 | game | 10 |
2021-10-02 | 1 | game | 10 | tv | 30 |
2021-10-02 | 1 | tv | 30 | tv | 30 |
2021-10-09 | 1 | music | 10 | tv | 30 |
2021-10-15 | 1 | music | 40 | music | 50 |
2021-10-01 | 2 | game | 10 | game | 10 |
2021-10-01 | 2 | music | 10 | game | 10 |
2021-10-04 | 2 | music | 20 | music | 30 |
2021-10-04 | 2 | game | 10 | music | 30 |
2021-10-05 | 2 | tv | 40 | tv | 40 |
2021-10-11 | 2 | tv | 40 | tv | 80 |
2021-10-12 | 2 | game | 20 | tv | 80 |