Bigquery:搜索多个表并使用 first_seen 和 last_seen 进行聚合
Bigquery: search multiple tables and aggregate with first_seen and last_seen
我有一个包含多个表的 Bigquery 数据库:
table1
id,timestamp,data
1,1428969600,AAAAA
2,1428969600,CCCCC
[..]
20,1428969600,ZZZZZ
table2
id,timestamp,data
1,1429056000,AAAAA
2,1429056000,BBBBB
3,1429056000,CCCCC
[..]
20,1429056000,ZZZZZ
table3
id,timestamp,data
1,1429142400,AAAAA
2,1429142400,BBBBB
3,1429142400,CCCCC
[..]
20,1429142400,ZZZZZ
我想 运行 搜索所有表(table1、table2 和 table3)以查看字段 "data" 中的值首次和最后出现的时间,并获取相关字段 "timestamp".
这应该是结果:
id,timestamp_first, timestamp_last,data
1,1428969600,1429142400,AAAAA
2,1429056000,1429142400,BBBBB
3,1428969600,1429142400,CCCCC
[..]
20,1428969600,1429142400,ZZZZZ
谁能告诉我如何进行这样的搜索?
马丁
我会先合并表(在 BigQuery 中合并的语法是逗号)。那么有两种做法:
- 使用解析函数 FIRST_VALUE 和 LAST_VALUE。
SELECT id, timestamp_first, timestamp_last, data FROM
(SELECT
id,
timestamp,
FIRST_VALUE(timestamp) OVER(
PARTITION BY id
ORDER BY timestamp ASC
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)
AS timestamp_first,
LAST_VALUE(timestamp) OVER(
PARTITION BY id
ORDER BY timestamp ASC
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)
AS timestamp_last
FROM table1, table2, table3
- 在时间戳上使用聚合 MIN/MAX 找到 first/last,然后连接回相同的表。
SELECT a.id id, timestamp_first, timestamp_last, data FROM
(SELECT id, data FROM table1,table2,table3) a
INNER JOIN
(SELECT
id,
MIN(timestamp) timestamp_first,
MAX(timestamp) timestamp_last
FROM table1,table2,table3 GROUP BY id) b
ON a.id = b.id
我有一个包含多个表的 Bigquery 数据库:
table1
id,timestamp,data
1,1428969600,AAAAA
2,1428969600,CCCCC
[..]
20,1428969600,ZZZZZ
table2
id,timestamp,data
1,1429056000,AAAAA
2,1429056000,BBBBB
3,1429056000,CCCCC
[..]
20,1429056000,ZZZZZ
table3
id,timestamp,data
1,1429142400,AAAAA
2,1429142400,BBBBB
3,1429142400,CCCCC
[..]
20,1429142400,ZZZZZ
我想 运行 搜索所有表(table1、table2 和 table3)以查看字段 "data" 中的值首次和最后出现的时间,并获取相关字段 "timestamp".
这应该是结果:
id,timestamp_first, timestamp_last,data
1,1428969600,1429142400,AAAAA
2,1429056000,1429142400,BBBBB
3,1428969600,1429142400,CCCCC
[..]
20,1428969600,1429142400,ZZZZZ
谁能告诉我如何进行这样的搜索?
马丁
我会先合并表(在 BigQuery 中合并的语法是逗号)。那么有两种做法:
- 使用解析函数 FIRST_VALUE 和 LAST_VALUE。
SELECT id, timestamp_first, timestamp_last, data FROM (SELECT id, timestamp, FIRST_VALUE(timestamp) OVER( PARTITION BY id ORDER BY timestamp ASC ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS timestamp_first, LAST_VALUE(timestamp) OVER( PARTITION BY id ORDER BY timestamp ASC ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS timestamp_last FROM table1, table2, table3
- 在时间戳上使用聚合 MIN/MAX 找到 first/last,然后连接回相同的表。
SELECT a.id id, timestamp_first, timestamp_last, data FROM (SELECT id, data FROM table1,table2,table3) a INNER JOIN (SELECT id, MIN(timestamp) timestamp_first, MAX(timestamp) timestamp_last FROM table1,table2,table3 GROUP BY id) b ON a.id = b.id