使用 google 分析数据将嵌套行转换为 bigquery 中的列
Transpose nested rows into columns in bigquery with google analytics data
我有兴趣使用自定义维度属性吸引访问者,其中每一行都是一个唯一的完整访问者 ID,并且需要列 customdimension.values。
以伦敦头盔为例,这里我拉客的是我感兴趣的两个自定义维度:
SELECT fullvisitorid, customDimensions.index, customDimensions.value
FROM [google.com:analytics-bigquery:LondonCycleHelmet.ga_sessions_20130910]
where customDimensions.index in (2,3)
group by fullvisitorid, customDimensions.index, customDimensions.value
它给出的结果如下:
+---------------+------------------------+------------------------+
| fullvisitorid | customDimensions_index | customDimensions_value |
+---------------+------------------------+------------------------+
| 1 | 2 | Bronze |
| 1 | 3 | Yes |
| 2 | 2 | Bronze |
| 2 | 3 | No |
| 3 | 2 | Bronze |
| 3 | 3 | Yes |
| 4 | 2 | Platinum |
| 4 | 3 | Yes |
+---------------+------------------------+------------------------+
我想要调换值,其中 customDimension_index 2 是颜色,customDimension_value 3 是是否,所以结果看起来像这样:
+---------------+----------+-------+
| fullvisitorid | color | yesno |
+---------------+----------+-------+
| 1 | Bronze | Yes |
| 2 | Bronze | No |
| 3 | Bronze | Yes |
| 4 | Platinum | Yes |
+---------------+----------+-------+
我可以分别拉一个然后另一个然后加入 fullvisitorid,但希望能够通过这种方式一步拉出数据。谢谢!
解决方法如下:
SELECT
fullvisitorid,
FIRST(IF(customDimensions.index=2, customDimensions.value, NULL)) color,
FIRST(IF(customDimensions.index=3, customDimensions.value, NULL)) yesno
FROM [google.com:analytics-bigquery:LondonCycleHelmet.ga_sessions_20130910]
where customDimensions.index in (2,3)
group by fullvisitorid
它依赖于这样一个事实,即任何聚合函数,包括 FIRST
,都会忽略 NULL
2020-01 更新:#standard SQL
更新
SELECT
fullvisitorid,
(SELECT value FROM UNNEST(customDimensions) WHERE index=2) color,
(SELECT value FROM UNNEST(customDimensions) WHERE index=3) yesno,
FROM `google.com:analytics-bigquery.LondonCycleHelmet.ga_sessions_20130910`
之前:
Mosha 的回答是正确的,但我想添加这个,因为它利用了 GA 记录的嵌套性质:
SELECT
fullvisitorid,
FIRST(IF(customDimensions.index=2, customDimensions.value, NULL)) WITHIN RECORD color,
FIRST(IF(customDimensions.index=3, customDimensions.value, NULL)) WITHIN RECORD yesno
FROM [google.com:analytics-bigquery:LondonCycleHelmet.ga_sessions_20130910]
WHERE customDimensions.index in (2,3)
原因:不是 运行 GROUP BY(它会消耗资源,因为它必须根据可能具有相同 customerid 的任何记录进行查找和分组),WITHIN RECORD 仅在各个行中查找。
如果一个 customerid 有多行(例如,他们访问了一次 Bronze/Yes 然后访问了 Platinum/No),结果将发出每一行和组合,而不仅仅是第一行.
我有兴趣使用自定义维度属性吸引访问者,其中每一行都是一个唯一的完整访问者 ID,并且需要列 customdimension.values。
以伦敦头盔为例,这里我拉客的是我感兴趣的两个自定义维度:
SELECT fullvisitorid, customDimensions.index, customDimensions.value
FROM [google.com:analytics-bigquery:LondonCycleHelmet.ga_sessions_20130910]
where customDimensions.index in (2,3)
group by fullvisitorid, customDimensions.index, customDimensions.value
它给出的结果如下:
+---------------+------------------------+------------------------+
| fullvisitorid | customDimensions_index | customDimensions_value |
+---------------+------------------------+------------------------+
| 1 | 2 | Bronze |
| 1 | 3 | Yes |
| 2 | 2 | Bronze |
| 2 | 3 | No |
| 3 | 2 | Bronze |
| 3 | 3 | Yes |
| 4 | 2 | Platinum |
| 4 | 3 | Yes |
+---------------+------------------------+------------------------+
我想要调换值,其中 customDimension_index 2 是颜色,customDimension_value 3 是是否,所以结果看起来像这样:
+---------------+----------+-------+
| fullvisitorid | color | yesno |
+---------------+----------+-------+
| 1 | Bronze | Yes |
| 2 | Bronze | No |
| 3 | Bronze | Yes |
| 4 | Platinum | Yes |
+---------------+----------+-------+
我可以分别拉一个然后另一个然后加入 fullvisitorid,但希望能够通过这种方式一步拉出数据。谢谢!
解决方法如下:
SELECT
fullvisitorid,
FIRST(IF(customDimensions.index=2, customDimensions.value, NULL)) color,
FIRST(IF(customDimensions.index=3, customDimensions.value, NULL)) yesno
FROM [google.com:analytics-bigquery:LondonCycleHelmet.ga_sessions_20130910]
where customDimensions.index in (2,3)
group by fullvisitorid
它依赖于这样一个事实,即任何聚合函数,包括 FIRST
,都会忽略 NULL
2020-01 更新:#standard SQL
更新SELECT
fullvisitorid,
(SELECT value FROM UNNEST(customDimensions) WHERE index=2) color,
(SELECT value FROM UNNEST(customDimensions) WHERE index=3) yesno,
FROM `google.com:analytics-bigquery.LondonCycleHelmet.ga_sessions_20130910`
之前:
Mosha 的回答是正确的,但我想添加这个,因为它利用了 GA 记录的嵌套性质:
SELECT
fullvisitorid,
FIRST(IF(customDimensions.index=2, customDimensions.value, NULL)) WITHIN RECORD color,
FIRST(IF(customDimensions.index=3, customDimensions.value, NULL)) WITHIN RECORD yesno
FROM [google.com:analytics-bigquery:LondonCycleHelmet.ga_sessions_20130910]
WHERE customDimensions.index in (2,3)
原因:不是 运行 GROUP BY(它会消耗资源,因为它必须根据可能具有相同 customerid 的任何记录进行查找和分组),WITHIN RECORD 仅在各个行中查找。
如果一个 customerid 有多行(例如,他们访问了一次 Bronze/Yes 然后访问了 Platinum/No),结果将发出每一行和组合,而不仅仅是第一行.