在 BigQuery 中查询?
Querying in BigQuery?
我在 BigQuery 中有一个包 table,如下所示:
Packageid Scanid dispatchid timestamp status
p1 s1 null t1 'in'
p2 s1 xxx t2 'in'
p1 s2 yyy t3 'pkin'
p1 s3 sss t4 'iwi'
p1 s4 eee t5 'lhp'
p2 s2 uuuu t6 'uio'
p2 s3 null t7 'jsk'
我想检索以下详细信息:
Packageid Latest-Scanid First-Dispatch-time Last-Dispatch-time latest-status
p1 s4 t3 t5 'lhp'
p2 s3 t2 t6 'jsk'
First-Dispatch-time 是包裹扫描中第一次出现dispatch id 的时间。
Last-Dispatch-time 是包裹扫描中最后一次dispatch id 出现的时间。
是否有任何方法可以使用 BigQuery 或您在 BigQuery 中定义的函数来获得上述 table?
一种方法使用 windows 函数和条件聚合:
select packageid,
max(case when seqnum = 1 then dispatchid end) as dispatchid,
min(case when dispatchid is not null then timestamp end) as first_dispatchid,
max(case when dispatchid is not null then timestamp end) as last_dispatchid,
max(case when seqnum = 1 then status end) as status
from (select t.*,
row_number() over (partition by packageid order by timestamp desc) as seqnum
from t
) t
group by packageid;
我会注意到这是针对 SQL 服务器的,可能在 MYSQL 中工作也可能不工作。
SELECT Packageid,
MAX(Scanid) [Latest_Scanid],
MIN(timestamp) [First-Dispatch-time],
MAX(timestamp) [Last-Dispatch-time],
(SELECT status FROM Package p WHERE p.timestamp = Package.timestamp AND p.Packageid = Package.Packageid) [latest-status]
FROM Package
下面的查询使用了一个 "dirty" 技巧(参见 not_null_ts),它允许消除外部分组依据,而是在内部 select
中计算所有内容
SELECT packageid, latest_scanid, first_dispatch_time, last_dispatch_time, latest_status
FROM (
SELECT packageid,
IF(dispatchid IS NULL, NULL, ts) AS not_null_ts,
FIRST_VALUE(scanid) OVER(PARTITION BY packageid ORDER BY ts DESC) AS latest_scanid,
MIN(not_null_ts) OVER(PARTITION BY packageid) AS first_dispatch_time,
MAX(not_null_ts) OVER(PARTITION BY packageid) AS last_dispatch_time,
FIRST_VALUE(status) OVER(PARTITION BY packageid ORDER BY ts DESC) AS latest_status,
ROW_NUMBER() OVER(PARTITION BY packageid ORDER BY not_null_ts DESC) AS line
FROM YourTable
)
WHERE line = 1
我前一段时间发现这种技巧对我有用,但我不认为我曾经明确地看到过这个记录,除非这可能是明显的用途 - 我从来没有想太多。
我在 BigQuery 中有一个包 table,如下所示:
Packageid Scanid dispatchid timestamp status
p1 s1 null t1 'in'
p2 s1 xxx t2 'in'
p1 s2 yyy t3 'pkin'
p1 s3 sss t4 'iwi'
p1 s4 eee t5 'lhp'
p2 s2 uuuu t6 'uio'
p2 s3 null t7 'jsk'
我想检索以下详细信息:
Packageid Latest-Scanid First-Dispatch-time Last-Dispatch-time latest-status
p1 s4 t3 t5 'lhp'
p2 s3 t2 t6 'jsk'
First-Dispatch-time 是包裹扫描中第一次出现dispatch id 的时间。 Last-Dispatch-time 是包裹扫描中最后一次dispatch id 出现的时间。
是否有任何方法可以使用 BigQuery 或您在 BigQuery 中定义的函数来获得上述 table?
一种方法使用 windows 函数和条件聚合:
select packageid,
max(case when seqnum = 1 then dispatchid end) as dispatchid,
min(case when dispatchid is not null then timestamp end) as first_dispatchid,
max(case when dispatchid is not null then timestamp end) as last_dispatchid,
max(case when seqnum = 1 then status end) as status
from (select t.*,
row_number() over (partition by packageid order by timestamp desc) as seqnum
from t
) t
group by packageid;
我会注意到这是针对 SQL 服务器的,可能在 MYSQL 中工作也可能不工作。
SELECT Packageid,
MAX(Scanid) [Latest_Scanid],
MIN(timestamp) [First-Dispatch-time],
MAX(timestamp) [Last-Dispatch-time],
(SELECT status FROM Package p WHERE p.timestamp = Package.timestamp AND p.Packageid = Package.Packageid) [latest-status]
FROM Package
下面的查询使用了一个 "dirty" 技巧(参见 not_null_ts),它允许消除外部分组依据,而是在内部 select
中计算所有内容SELECT packageid, latest_scanid, first_dispatch_time, last_dispatch_time, latest_status
FROM (
SELECT packageid,
IF(dispatchid IS NULL, NULL, ts) AS not_null_ts,
FIRST_VALUE(scanid) OVER(PARTITION BY packageid ORDER BY ts DESC) AS latest_scanid,
MIN(not_null_ts) OVER(PARTITION BY packageid) AS first_dispatch_time,
MAX(not_null_ts) OVER(PARTITION BY packageid) AS last_dispatch_time,
FIRST_VALUE(status) OVER(PARTITION BY packageid ORDER BY ts DESC) AS latest_status,
ROW_NUMBER() OVER(PARTITION BY packageid ORDER BY not_null_ts DESC) AS line
FROM YourTable
)
WHERE line = 1
我前一段时间发现这种技巧对我有用,但我不认为我曾经明确地看到过这个记录,除非这可能是明显的用途 - 我从来没有想太多。