如何使用 BigQuery 分析函数计算时间戳行之间的时间?
How to use BigQuery Analytic Functions to calculate time between timestamped rows?
我有一个代表分析事件的数据集,例如:
Row timestamp account_id type
1 2018-11-14 21:05:40 UTC abc start
2 2018-11-14 21:05:40 UTC xyz another_type
3 2018-11-26 22:01:19 UTC xyz start
4 2018-11-26 22:01:23 UTC abc start
5 2018-11-26 22:01:29 UTC xyz some_other_type
11 2018-11-26 22:13:58 UTC xyz start
...
有一些account_ids。我需要找到每个 account_id
.
start
条记录之间的平均时间
我正在尝试使用所描述的分析函数 here。我的最终目标是 table 像:
Row account_id avg_time_between_events_mins
1 xyz 53
2 abc 47
3 pqr 65
...
我最好的尝试——基于 this post——看起来像这样:
WITH
events AS (
SELECT
COUNTIF(type = 'start' AND account_id='abc') OVER (ORDER BY timestamp) as diff,
timestamp
FROM
`myproject.dataset.events`
WHERE
account_id='abc')
SELECT
min(timestamp) AS start_time,
max(timestamp) AS next_start_time,
ABS(timestamp_diff(min(timestamp), max(timestamp), MINUTE)) AS minutes_between
FROM
events
GROUP BY
diff
这会计算每个 start
事件与特定 account_id
.[=24 的下一个 start
事件之前的最后一个非 start
事件之间的时间=]
我试过像这样使用 PARTITION
和 WINDOW FRAME CLAUSE
:
WITH
events AS (
SELECT
COUNT(*) OVER (PARTITION BY account_id ORDER BY timestamp ROWS BETWEEN CURRENT ROW AND 1 FOLLOWING) as diff,
timestamp
FROM
`myproject.dataset.events`
WHERE
type = 'start')
SELECT
min(timestamp) AS start_time,
max(timestamp) AS next_start_time,
ABS(timestamp_diff(min(timestamp), max(timestamp), MINUTE)) AS minutes_between
FROM
events
GROUP BY
diff
但是我得到了一个废话结果table。谁能告诉我如何编写和推理这样的查询?
你真的不需要解析函数:
select timestamp_diff(min(timestamp), max(timestamp), MINUTE)) / nullif(count(*) - 1, 0)
from `myproject.dataset.events`
where type = 'start'
group by account_id;
这是最近的时间戳减去最旧的,除以启动次数减一。这是开始之间的平均值。
我有一个代表分析事件的数据集,例如:
Row timestamp account_id type
1 2018-11-14 21:05:40 UTC abc start
2 2018-11-14 21:05:40 UTC xyz another_type
3 2018-11-26 22:01:19 UTC xyz start
4 2018-11-26 22:01:23 UTC abc start
5 2018-11-26 22:01:29 UTC xyz some_other_type
11 2018-11-26 22:13:58 UTC xyz start
...
有一些account_ids。我需要找到每个 account_id
.
start
条记录之间的平均时间
我正在尝试使用所描述的分析函数 here。我的最终目标是 table 像:
Row account_id avg_time_between_events_mins
1 xyz 53
2 abc 47
3 pqr 65
...
我最好的尝试——基于 this post——看起来像这样:
WITH
events AS (
SELECT
COUNTIF(type = 'start' AND account_id='abc') OVER (ORDER BY timestamp) as diff,
timestamp
FROM
`myproject.dataset.events`
WHERE
account_id='abc')
SELECT
min(timestamp) AS start_time,
max(timestamp) AS next_start_time,
ABS(timestamp_diff(min(timestamp), max(timestamp), MINUTE)) AS minutes_between
FROM
events
GROUP BY
diff
这会计算每个 start
事件与特定 account_id
.[=24 的下一个 start
事件之前的最后一个非 start
事件之间的时间=]
我试过像这样使用 PARTITION
和 WINDOW FRAME CLAUSE
:
WITH
events AS (
SELECT
COUNT(*) OVER (PARTITION BY account_id ORDER BY timestamp ROWS BETWEEN CURRENT ROW AND 1 FOLLOWING) as diff,
timestamp
FROM
`myproject.dataset.events`
WHERE
type = 'start')
SELECT
min(timestamp) AS start_time,
max(timestamp) AS next_start_time,
ABS(timestamp_diff(min(timestamp), max(timestamp), MINUTE)) AS minutes_between
FROM
events
GROUP BY
diff
但是我得到了一个废话结果table。谁能告诉我如何编写和推理这样的查询?
你真的不需要解析函数:
select timestamp_diff(min(timestamp), max(timestamp), MINUTE)) / nullif(count(*) - 1, 0)
from `myproject.dataset.events`
where type = 'start'
group by account_id;
这是最近的时间戳减去最旧的,除以启动次数减一。这是开始之间的平均值。