按天分组,在 presto 中另一个字段
group by day and another field in presto
我想按天(我的时间是在 unix 时代)和给我分数乘以乘数的名字分组。
我的 table 看起来像这样:
time | name | score | multiplier
1646733600000000 | Albert | 10 | 1
1646744400000000 | John | 5 | 1.5
1646780400000000 | John | 3 | 2.2
1646812800000000 | Albert | 7 | 1.1
1646816400000000 | Marie | 26 | 0.5
1646852400000000 | Marie | 1 | 2.2
1646870400000000 | Georgie | 30 | 0.75
1646924400000000 | Georgie | 2 | 0.5
1646938800000000 | Kevin | 15 | 0.3
结果将是:
day | name | score * multiplier
2022-03-08 | Albert | 10 * 1
2022-03-08 | John | (5 * 1.5) + (3 * 2.2)
2022-03-09 | Albert | 7 * 1.1
2022-03-09 | Marie | (26 * 0.5) + (1 * 2.2)
2022-03-10 | Georgie | (30 * 0.75) + (2 * 0.5)
2022-03-10 | Kevin | 15 * 0.3
这应该很简单 - 主要技巧是将时间戳除以 1000000 以秒而不是微秒为单位:
-- sample data
WITH dataset (time, name, score, multiplier) AS (
VALUES
(1646733600000000, 'Albert', 10, 1),
(1646744400000000, 'John', 5 , 1.5),
(1646780400000000, 'John', 3 , 2.2),
(1646812800000000, 'Albert', 7 , 1.1),
(1646816400000000, 'Marie', 26, 0.5),
(1646852400000000, 'Marie', 1 , 2.2),
(1646870400000000, 'Georgie', 30, 0.75),
(1646924400000000, 'Georgie', 2 , 0.5),
(1646938800000000, 'Kevin', 15, 0.3)
)
--query
select date(from_unixtime(time/1000000)) day, name, sum(score*multiplier) "score * multiplier"
from dataset
group by 1, name -- or date(from_unixtime(time/1000000)) instead of 1
order by 1, name -- for output order
输出:
day
name
score * multiplier
2022-03-08
Albert
10.0
2022-03-08
John
14.100000000000001
2022-03-09
Albert
7.700000000000001
2022-03-09
Marie
15.2
2022-03-10
Georgie
23.5
2022-03-10
Kevin
4.5
我想按天(我的时间是在 unix 时代)和给我分数乘以乘数的名字分组。
我的 table 看起来像这样:
time | name | score | multiplier
1646733600000000 | Albert | 10 | 1
1646744400000000 | John | 5 | 1.5
1646780400000000 | John | 3 | 2.2
1646812800000000 | Albert | 7 | 1.1
1646816400000000 | Marie | 26 | 0.5
1646852400000000 | Marie | 1 | 2.2
1646870400000000 | Georgie | 30 | 0.75
1646924400000000 | Georgie | 2 | 0.5
1646938800000000 | Kevin | 15 | 0.3
结果将是:
day | name | score * multiplier
2022-03-08 | Albert | 10 * 1
2022-03-08 | John | (5 * 1.5) + (3 * 2.2)
2022-03-09 | Albert | 7 * 1.1
2022-03-09 | Marie | (26 * 0.5) + (1 * 2.2)
2022-03-10 | Georgie | (30 * 0.75) + (2 * 0.5)
2022-03-10 | Kevin | 15 * 0.3
这应该很简单 - 主要技巧是将时间戳除以 1000000 以秒而不是微秒为单位:
-- sample data
WITH dataset (time, name, score, multiplier) AS (
VALUES
(1646733600000000, 'Albert', 10, 1),
(1646744400000000, 'John', 5 , 1.5),
(1646780400000000, 'John', 3 , 2.2),
(1646812800000000, 'Albert', 7 , 1.1),
(1646816400000000, 'Marie', 26, 0.5),
(1646852400000000, 'Marie', 1 , 2.2),
(1646870400000000, 'Georgie', 30, 0.75),
(1646924400000000, 'Georgie', 2 , 0.5),
(1646938800000000, 'Kevin', 15, 0.3)
)
--query
select date(from_unixtime(time/1000000)) day, name, sum(score*multiplier) "score * multiplier"
from dataset
group by 1, name -- or date(from_unixtime(time/1000000)) instead of 1
order by 1, name -- for output order
输出:
day | name | score * multiplier |
---|---|---|
2022-03-08 | Albert | 10.0 |
2022-03-08 | John | 14.100000000000001 |
2022-03-09 | Albert | 7.700000000000001 |
2022-03-09 | Marie | 15.2 |
2022-03-10 | Georgie | 23.5 |
2022-03-10 | Kevin | 4.5 |