'Immediate Follow' BigQuery 中的页面路径
'Immediate Follow' Page Path in BigQuery
我在 BigQuery 中工作以了解有多少用户完成了特定的页面路径(在会话中的任何时间点)。假设页面路径是第 1 页 -> 第 2 页 -> 第 3 页。页面必须按顺序排列。我可以使用 BQ 来建立页面路径 - 但此方法仅适用于识别在会话中的任何时候访问这些页面的用户。例如,第 1 页 -> 第 456 页 -> 第 2 页。
有什么想法吗?
(SELECT [date]
, CASE WHEN pages like '/Page1' then fullVisitorId end as [users]
, CASE WHEN pages like '/Page1>>/Page2' then fullVisitorId end as [path_users_2]
, CASE WHEN pages like '/Page1>>Page2>>Page3' then fullVisitorId end as [path_users_3]
, [path_type]
, [path]
, [product]
, [device.deviceCategory]
FROM
( SELECT [date]
, [fullVisitorId]
, [visitId]
, [visitNumber]
, group_concat(hits.page.pagePath,'>>') as [pages]
, 'New Pages' as [path_type]
, 'Upgrade' as [path]
, 'Professional' as [product]
FROM
(
TABLE_DATE_RANGE
( [XXXXXX.ga_sessions_]
, TIMESTAMP('2014-06-01')
, TIMESTAMP('2014-06-05') )
)
where
(REGEXP_MATCH(hits.page.pagePath,r'^/Page1($|/$|\?|/\?|%3F)'))
or (REGEXP_MATCH(hits.page.pagePath,r'^/Page2($|/$|\?|/\?|%3F)'))
or ( (REGEXP_MATCH(hits.page.pagePath,r'^/Page3($|/$|\?|/\?|%3F)'))
and hits.transaction.transactionId is not null
and hits.item.productSku is not null
and hits.item.itemRevenue is not null )
group each by [date]
, [fullVisitorId]
, [visitId]
, [visitNumber]
, [path_type]
, [path]
, [product]
, [device.deviceCategory]
)
group each by
[date]
, [path_type]
, [path]
, [product]
, [users]
, [path_users_2]
, [path_users_3]
, [device.deviceCategory]
)
您需要构造一个查询序列,并逐步到达您的完整路径,使用hits.time as time sequence. Taking example from Streak blog post: Using Google BigQuery for Event Tracking
我们可以创建一个子查询来确定 visitHomepage 事件:
(SELECT sessionId as sessionId1,
timestamp as timestamp1
FROM [events.log]
WHERE name = "visitHomepage") AS step1
然后类似step2,step3.
然后你可以将这些组合起来得到steps1_2
(SELECT sessionId1,
timestamp1,
IF(timestamp1 < timestamp2, timestamp2, NULL) as timestamp2
FROM
(SELECT sessionId1,
timestamp1,
timestamp2
FROM step1
LEFT JOIN step2
ON sessionId1 = sessionId2)
) AS steps1_2
得到我们想要的子查询!
(SELECT sessionId1 as sessionId,
timestamp1 as visitHomepageTimestamp,
timestamp2 as installExtensionTimestamp,
IF(timestamp2 < timestamp3, timestamp3, NULL) as signInTimestamp
FROM
(SELECT sessionId2,
timestamp2,
timestamp3
FROM steps1_2
LEFT JOIN step3
ON sessionId1 = sessionId3)
) AS steps1_2_3
阅读以上链接blog post to have a granular step by step explanation how to construct the query, and also check out BigQuery Cookbook。
或者,您可以根据 hits.time
对查询进行排序,以定义用户访问的页面顺序,并使用 ROW_NUMBER
或 POSITION
为它们添加序号,这样您就可以进一步使用该结果集。
/对于您的特定用例,我很确定您可以通过避免 JOIN 和 GROUP BY 来更快地执行此操作。
考虑:
SELECT
[date], fullVisitorId, visitId, visitNumber,
GROUP_CONCAT(REGEXP_EXTRACT(hits.page.pagePath, '^(/[^/?]*)'), ">>")
WITHIN RECORD AS Sequence,
FROM
(TABLE_DATE_RANGE
( [XXXXXX.ga_sessions_]
, TIMESTAMP('2014-06-01')
, TIMESTAMP('2014-06-05') )
)
WHERE REGEXP_MATCH(hits.page.pagePath, r'^/Page[123]')
HAVING
Sequence CONTAINS "/Page1>>/Page2>>/Page3";
这利用了 RECORD
级别的 scoped aggregation 来避免 GROUP BY
单独的会话。
此外,单个记录在 Bigquery 中是原子的,它们的重复字段按照导入时提供的顺序进行处理。因此,对于 GA 会话日志,命中子记录在所有操作完成后按顺序连接 WITHIN RECORD
。展平命中时间戳,然后将它们与比较结合起来,实际上只是重做这项工作。
我在 BigQuery 中工作以了解有多少用户完成了特定的页面路径(在会话中的任何时间点)。假设页面路径是第 1 页 -> 第 2 页 -> 第 3 页。页面必须按顺序排列。我可以使用 BQ 来建立页面路径 - 但此方法仅适用于识别在会话中的任何时候访问这些页面的用户。例如,第 1 页 -> 第 456 页 -> 第 2 页。
有什么想法吗?
(SELECT [date]
, CASE WHEN pages like '/Page1' then fullVisitorId end as [users]
, CASE WHEN pages like '/Page1>>/Page2' then fullVisitorId end as [path_users_2]
, CASE WHEN pages like '/Page1>>Page2>>Page3' then fullVisitorId end as [path_users_3]
, [path_type]
, [path]
, [product]
, [device.deviceCategory]
FROM
( SELECT [date]
, [fullVisitorId]
, [visitId]
, [visitNumber]
, group_concat(hits.page.pagePath,'>>') as [pages]
, 'New Pages' as [path_type]
, 'Upgrade' as [path]
, 'Professional' as [product]
FROM
(
TABLE_DATE_RANGE
( [XXXXXX.ga_sessions_]
, TIMESTAMP('2014-06-01')
, TIMESTAMP('2014-06-05') )
)
where
(REGEXP_MATCH(hits.page.pagePath,r'^/Page1($|/$|\?|/\?|%3F)'))
or (REGEXP_MATCH(hits.page.pagePath,r'^/Page2($|/$|\?|/\?|%3F)'))
or ( (REGEXP_MATCH(hits.page.pagePath,r'^/Page3($|/$|\?|/\?|%3F)'))
and hits.transaction.transactionId is not null
and hits.item.productSku is not null
and hits.item.itemRevenue is not null )
group each by [date]
, [fullVisitorId]
, [visitId]
, [visitNumber]
, [path_type]
, [path]
, [product]
, [device.deviceCategory]
)
group each by
[date]
, [path_type]
, [path]
, [product]
, [users]
, [path_users_2]
, [path_users_3]
, [device.deviceCategory]
)
您需要构造一个查询序列,并逐步到达您的完整路径,使用hits.time as time sequence. Taking example from Streak blog post: Using Google BigQuery for Event Tracking
我们可以创建一个子查询来确定 visitHomepage 事件:
(SELECT sessionId as sessionId1,
timestamp as timestamp1
FROM [events.log]
WHERE name = "visitHomepage") AS step1
然后类似step2,step3.
然后你可以将这些组合起来得到steps1_2
(SELECT sessionId1,
timestamp1,
IF(timestamp1 < timestamp2, timestamp2, NULL) as timestamp2
FROM
(SELECT sessionId1,
timestamp1,
timestamp2
FROM step1
LEFT JOIN step2
ON sessionId1 = sessionId2)
) AS steps1_2
得到我们想要的子查询!
(SELECT sessionId1 as sessionId,
timestamp1 as visitHomepageTimestamp,
timestamp2 as installExtensionTimestamp,
IF(timestamp2 < timestamp3, timestamp3, NULL) as signInTimestamp
FROM
(SELECT sessionId2,
timestamp2,
timestamp3
FROM steps1_2
LEFT JOIN step3
ON sessionId1 = sessionId3)
) AS steps1_2_3
阅读以上链接blog post to have a granular step by step explanation how to construct the query, and also check out BigQuery Cookbook。
或者,您可以根据 hits.time
对查询进行排序,以定义用户访问的页面顺序,并使用 ROW_NUMBER
或 POSITION
为它们添加序号,这样您就可以进一步使用该结果集。
/对于您的特定用例,我很确定您可以通过避免 JOIN 和 GROUP BY 来更快地执行此操作。
考虑:
SELECT
[date], fullVisitorId, visitId, visitNumber,
GROUP_CONCAT(REGEXP_EXTRACT(hits.page.pagePath, '^(/[^/?]*)'), ">>")
WITHIN RECORD AS Sequence,
FROM
(TABLE_DATE_RANGE
( [XXXXXX.ga_sessions_]
, TIMESTAMP('2014-06-01')
, TIMESTAMP('2014-06-05') )
)
WHERE REGEXP_MATCH(hits.page.pagePath, r'^/Page[123]')
HAVING
Sequence CONTAINS "/Page1>>/Page2>>/Page3";
这利用了 RECORD
级别的 scoped aggregation 来避免 GROUP BY
单独的会话。
此外,单个记录在 Bigquery 中是原子的,它们的重复字段按照导入时提供的顺序进行处理。因此,对于 GA 会话日志,命中子记录在所有操作完成后按顺序连接 WITHIN RECORD
。展平命中时间戳,然后将它们与比较结合起来,实际上只是重做这项工作。