Google 分析和 BigQuery 之间的会话不匹配,同时取消嵌套命中和 hits.product
Sessions Mismatch between Google analytics and BigQuery while unnesting hits and hits.product together
当我尝试 运行 以下查询时,我发现 Google 分析和 BQ 数据之间的数据存在 15% 的差异:
SELECT
SUM(Sessions) AS Sessions
FROM (
SELECT
PARSE_DATE("%Y%m%d",
date) AS DATE,
COUNT(DISTINCT CONCAT(fullVisitorId,"-",CAST(visitStartTime AS STRING))) AS Sessions,
(COUNT(DISTINCT
CASE
WHEN totals.bounces = 1 THEN CONCAT(fullVisitorId, CAST(visitStartTime AS STRING))
ELSE NULL
END ) / COUNT(DISTINCT CONCAT(fullVisitorId, CAST(visitStartTime AS STRING))))*100 AS Bounce_Rate,
COUNT(DISTINCT hits.transaction.transactionId) AS Transactions,
SUM(hits.transaction.transactionRevenue)/1000000 AS Revenue,
SUM(p.productRevenue)/1000000 AS Product_Revenue,
(COUNT(DISTINCT hits.transaction.transactionId) / COUNT(DISTINCT CONCAT(CAST(fullVisitorId AS STRING), CAST(visitStartTime AS STRING))))*100 AS Ecommerce_Conversion_Rate,
(SUM(hits.transaction.transactionRevenue)/1000000)/COUNT(DISTINCT hits.transaction.transactionId) AS Avg_Order_Value,
SUM(hits.item.itemQuantity) / COUNT(hits.transaction.transactionId) AS Avg_Quantity,
device.deviceCategory AS DeviceCategory,
channelGrouping AS DefaultChannelGrouping,
CONCAT(trafficSource.source," / ",trafficSource.medium) AS Source_Medium
FROM
`[Project_ID].[Dataset].ga_sessions_2019*`,
UNNEST(hits) AS hits,
UNNEST(hits.product) AS p
GROUP BY
DATE,
DeviceCategory,
DefaultChannelGrouping,
Source_Medium )
WHERE
DATE BETWEEN "2019-11-17"
AND "2019-11-23"
但是当我去掉 UNNEST(hits.product) AS p
时,我得到了较低级别的差异。我想知道如何 UNNEST
hits
和 hits.product
数据一起
您正在与产品数组交叉连接。如果缺少产品数组,则交叉连接将导致 NULL
- 有效地擦除整个命中,有时甚至是整个会话(如果只有一个没有产品信息的命中)。
您必须 LEFT JOIN
使用产品数组以防止删除 hits/sessions.
FROM `[Project_ID].[Dataset].ga_sessions_2019*` AS t
CROSS JOIN UNNEST(hits) AS h
LEFT JOIN UNNEST(product) AS p
或简而言之
FROM `[Project_ID].[Dataset].ga_sessions_2019*` AS t, t.hits h LEFT JOIN h.product p
当我尝试 运行 以下查询时,我发现 Google 分析和 BQ 数据之间的数据存在 15% 的差异:
SELECT
SUM(Sessions) AS Sessions
FROM (
SELECT
PARSE_DATE("%Y%m%d",
date) AS DATE,
COUNT(DISTINCT CONCAT(fullVisitorId,"-",CAST(visitStartTime AS STRING))) AS Sessions,
(COUNT(DISTINCT
CASE
WHEN totals.bounces = 1 THEN CONCAT(fullVisitorId, CAST(visitStartTime AS STRING))
ELSE NULL
END ) / COUNT(DISTINCT CONCAT(fullVisitorId, CAST(visitStartTime AS STRING))))*100 AS Bounce_Rate,
COUNT(DISTINCT hits.transaction.transactionId) AS Transactions,
SUM(hits.transaction.transactionRevenue)/1000000 AS Revenue,
SUM(p.productRevenue)/1000000 AS Product_Revenue,
(COUNT(DISTINCT hits.transaction.transactionId) / COUNT(DISTINCT CONCAT(CAST(fullVisitorId AS STRING), CAST(visitStartTime AS STRING))))*100 AS Ecommerce_Conversion_Rate,
(SUM(hits.transaction.transactionRevenue)/1000000)/COUNT(DISTINCT hits.transaction.transactionId) AS Avg_Order_Value,
SUM(hits.item.itemQuantity) / COUNT(hits.transaction.transactionId) AS Avg_Quantity,
device.deviceCategory AS DeviceCategory,
channelGrouping AS DefaultChannelGrouping,
CONCAT(trafficSource.source," / ",trafficSource.medium) AS Source_Medium
FROM
`[Project_ID].[Dataset].ga_sessions_2019*`,
UNNEST(hits) AS hits,
UNNEST(hits.product) AS p
GROUP BY
DATE,
DeviceCategory,
DefaultChannelGrouping,
Source_Medium )
WHERE
DATE BETWEEN "2019-11-17"
AND "2019-11-23"
但是当我去掉 UNNEST(hits.product) AS p
时,我得到了较低级别的差异。我想知道如何 UNNEST
hits
和 hits.product
数据一起
您正在与产品数组交叉连接。如果缺少产品数组,则交叉连接将导致 NULL
- 有效地擦除整个命中,有时甚至是整个会话(如果只有一个没有产品信息的命中)。
您必须 LEFT JOIN
使用产品数组以防止删除 hits/sessions.
FROM `[Project_ID].[Dataset].ga_sessions_2019*` AS t
CROSS JOIN UNNEST(hits) AS h
LEFT JOIN UNNEST(product) AS p
或简而言之
FROM `[Project_ID].[Dataset].ga_sessions_2019*` AS t, t.hits h LEFT JOIN h.product p