跟踪 Google 分析数据中点击之间的时间差异
Tracking Differences of Times Between Hits in Google Analytics Data
所以我在 BigQuery 上使用 Google Analytics 数据,并希望能够跟踪点击之间的差异,按我拥有的可以在会话中更改的自定义维度分组。如果我的数据看起来像
__________________________
| customDimension1 | Time |
|__________________|______|
| abc | t1 |
| abc | t2 |
| def | t3 |
| def | t4 |
| def | t5 |
| abc | t6 |
| abc | t7 |
|__________________|______|
我希望能够得到类似
的东西
_______________________________________
| customDimension1 | Time | Difference |
|__________________|______|____________|
| abc | t1 | t2 - t1 |
| abc | t2 | t3 - t2 |
| def | t3 | t4 - t3 |
| def | t4 | t5 - t4 |
| def | t5 | t6 - t5 |
| abc | t6 | t7 - t6 |
| abc | t7 | 0 |
|__________________|______|____________|
关于如何在不经过 Dataflow/Dataproc 转换的情况下执行此操作的好主意?
以下适用于 BigQuery 标准 SQL(并假设时间列为 TIMESTAMP 数据类型)
#standardSQL
SELECT *,
TIMESTAMP_DIFF(IFNULL(LEAD(Time) OVER(ORDER BY Time), Time), Time, SECOND) AS Difference
FROM `project.dataset.table`
你可以像下面的例子一样使用虚拟数据来测试和玩上面的游戏
#standardSQL
WITH `project.dataset.table` AS (
SELECT 'abc' customDimension1, TIMESTAMP '2020-08-26 23:03:21.938228 UTC' Time UNION ALL
SELECT 'abc', '2020-08-26 23:03:23.938228 UTC' UNION ALL
SELECT 'def', '2020-08-26 23:03:26.938228 UTC' UNION ALL
SELECT 'def', '2020-08-26 23:03:28.938228 UTC' UNION ALL
SELECT 'def', '2020-08-26 23:03:41.938228 UTC' UNION ALL
SELECT 'abc', '2020-08-26 23:03:51.938228 UTC' UNION ALL
SELECT 'abc', '2020-08-26 23:03:55.938228 UTC'
)
SELECT *,
TIMESTAMP_DIFF(IFNULL(LEAD(Time) OVER(ORDER BY Time), Time), Time, SECOND) AS Difference
FROM `project.dataset.table`
有输出
Row customDimension1 Time Difference
1 abc 2020-08-26 23:03:21.938228 UTC 2
2 abc 2020-08-26 23:03:23.938228 UTC 3
3 def 2020-08-26 23:03:26.938228 UTC 2
4 def 2020-08-26 23:03:28.938228 UTC 13
5 def 2020-08-26 23:03:41.938228 UTC 10
6 abc 2020-08-26 23:03:51.938228 UTC 4
7 abc 2020-08-26 23:03:55.938228 UTC 0
所以我在 BigQuery 上使用 Google Analytics 数据,并希望能够跟踪点击之间的差异,按我拥有的可以在会话中更改的自定义维度分组。如果我的数据看起来像
__________________________
| customDimension1 | Time |
|__________________|______|
| abc | t1 |
| abc | t2 |
| def | t3 |
| def | t4 |
| def | t5 |
| abc | t6 |
| abc | t7 |
|__________________|______|
我希望能够得到类似
的东西_______________________________________
| customDimension1 | Time | Difference |
|__________________|______|____________|
| abc | t1 | t2 - t1 |
| abc | t2 | t3 - t2 |
| def | t3 | t4 - t3 |
| def | t4 | t5 - t4 |
| def | t5 | t6 - t5 |
| abc | t6 | t7 - t6 |
| abc | t7 | 0 |
|__________________|______|____________|
关于如何在不经过 Dataflow/Dataproc 转换的情况下执行此操作的好主意?
以下适用于 BigQuery 标准 SQL(并假设时间列为 TIMESTAMP 数据类型)
#standardSQL
SELECT *,
TIMESTAMP_DIFF(IFNULL(LEAD(Time) OVER(ORDER BY Time), Time), Time, SECOND) AS Difference
FROM `project.dataset.table`
你可以像下面的例子一样使用虚拟数据来测试和玩上面的游戏
#standardSQL
WITH `project.dataset.table` AS (
SELECT 'abc' customDimension1, TIMESTAMP '2020-08-26 23:03:21.938228 UTC' Time UNION ALL
SELECT 'abc', '2020-08-26 23:03:23.938228 UTC' UNION ALL
SELECT 'def', '2020-08-26 23:03:26.938228 UTC' UNION ALL
SELECT 'def', '2020-08-26 23:03:28.938228 UTC' UNION ALL
SELECT 'def', '2020-08-26 23:03:41.938228 UTC' UNION ALL
SELECT 'abc', '2020-08-26 23:03:51.938228 UTC' UNION ALL
SELECT 'abc', '2020-08-26 23:03:55.938228 UTC'
)
SELECT *,
TIMESTAMP_DIFF(IFNULL(LEAD(Time) OVER(ORDER BY Time), Time), Time, SECOND) AS Difference
FROM `project.dataset.table`
有输出
Row customDimension1 Time Difference
1 abc 2020-08-26 23:03:21.938228 UTC 2
2 abc 2020-08-26 23:03:23.938228 UTC 3
3 def 2020-08-26 23:03:26.938228 UTC 2
4 def 2020-08-26 23:03:28.938228 UTC 13
5 def 2020-08-26 23:03:41.938228 UTC 10
6 abc 2020-08-26 23:03:51.938228 UTC 4
7 abc 2020-08-26 23:03:55.938228 UTC 0