如何在 ga4_obfuscated_sample_ecommerce public 数据集中查找非活跃用户
How to find inactive users in ga4_obfuscated_sample_ecommerce public dataset
我最近开始使用 GCP,正在学习如何使用 public 数据集。
我正在尝试识别 N 天 - SQL 的不活跃用户。
您可以在此处找到数据集:https://developers.google.com/analytics/bigquery/web-ecommerce-demo-dataset
通过文档我找到了以下代码。
/**
* Builds an audience of N-Day Inactive Users.
*
* N-Day inactive users = users in the last M days who have not logged one
* event with event param engagement_time_msec > 0 in the last N days
* where M > N.
*/
SELECT
COUNT(DISTINCT MDaysUsers.user_id) AS n_day_inactive_users_count
FROM
(
SELECT
user_id
FROM
/* PLEASE REPLACE WITH YOUR TABLE NAME */
`YOUR_TABLE.events_*` AS T
CROSS JOIN
T.event_params
WHERE
event_params.key = 'engagement_time_msec' AND event_params.value.int_value > 0
/* Has engaged in last M = 7 days */
AND event_timestamp >
UNIX_MICROS(TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 7 DAY))
/* PLEASE REPLACE WITH YOUR DESIRED DATE RANGE */
AND _TABLE_SUFFIX BETWEEN '20180521' AND '20240131'
) AS MDaysUsers
-- EXCEPT ALL is not yet implemented in BigQuery. Use LEFT JOIN in the interim.
LEFT JOIN
(
SELECT
user_id
FROM
/* PLEASE REPLACE WITH YOUR TABLE NAME */
`YOUR_TABLE.events_*`AS T
CROSS JOIN
T.event_params
WHERE
event_params.key = 'engagement_time_msec' AND event_params.value.int_value > 0
/* Has engaged in last N = 2 days */
AND event_timestamp >
UNIX_MICROS(TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 2 DAY))
/* PLEASE REPLACE WITH YOUR DESIRED DATE RANGE */
AND _TABLE_SUFFIX BETWEEN '20180521' AND '20240131'
) AS NDaysUsers
ON MDaysUsers.user_id = NDaysUsers.user_id
WHERE
NDaysUsers.user_id IS NULL;
但是,当用 user_pseudo_id 替换 user_id 时,YOUR_TABLE 值用 bigquery-public-data.ga4_obfuscated_sample_ecommerce 并更正我的日期没有结果。
有没有办法将此代码应用于示例数据集?
我能看到的是,您必须删除 _TABLE_SUFFIX
行,因为 bigquery-public-data.ga4_obfuscated_sample_ecommerce
的数据集只有一个事件 table。
我所做的是获取最大时间戳日期,将其替换为 CURRENT_TIMESTAMP() 以更正日期。
SELECT MAX(TIMESTAMP_MICROS(event_timestamp)) as time
FROM `bigquery-public-data.ga4_obfuscated_sample_ecommerce.events_*
之后,我更改了您所说的值,将 user_id
替换为 user_pseudo_id
,并更正了时间和 table 名称。
考虑以下方法:
SELECT
COUNT(DISTINCT MDaysUsers.user_pseudo_id) AS n_day_inactive_users_count
FROM
(
SELECT
user_pseudo_id
FROM
/* PLEASE REPLACE WITH YOUR TABLE NAME */
`bigquery-public-data.ga4_obfuscated_sample_ecommerce.events_*` AS T
CROSS JOIN
T.event_params
WHERE
event_params.key = 'engagement_time_msec' AND event_params.value.int_value > 0
/* Has engaged in last M = 7 days */
AND event_timestamp >
UNIX_MICROS(TIMESTAMP_SUB("2021-01-31 23:59:55.412363", INTERVAL 7 DAY))
/* PLEASE REPLACE WITH YOUR DESIRED DATE RANGE */
) AS MDaysUsers
-- EXCEPT ALL is not yet implemented in BigQuery. Use LEFT JOIN in the interim.
LEFT JOIN
(
SELECT
user_pseudo_id
FROM
/* PLEASE REPLACE WITH YOUR TABLE NAME */
`bigquery-public-data.ga4_obfuscated_sample_ecommerce.events_*`AS T
CROSS JOIN
T.event_params
WHERE
event_params.key = 'engagement_time_msec' AND event_params.value.int_value > 0
/* Has engaged in last N = 2 days */
AND event_timestamp >
UNIX_MICROS(TIMESTAMP_SUB("2021-01-31 23:59:55.412363", INTERVAL 2 DAY))
/* PLEASE REPLACE WITH YOUR DESIRED DATE RANGE */
) AS NDaysUsers
ON MDaysUsers.user_pseudo_id = NDaysUsers.user_pseudo_id
WHERE
NDaysUsers.user_pseudo_id IS NULL;
它显示的结果是下一个:
我最近开始使用 GCP,正在学习如何使用 public 数据集。
我正在尝试识别 N 天 - SQL 的不活跃用户。
您可以在此处找到数据集:https://developers.google.com/analytics/bigquery/web-ecommerce-demo-dataset
通过文档我找到了以下代码。
/**
* Builds an audience of N-Day Inactive Users.
*
* N-Day inactive users = users in the last M days who have not logged one
* event with event param engagement_time_msec > 0 in the last N days
* where M > N.
*/
SELECT
COUNT(DISTINCT MDaysUsers.user_id) AS n_day_inactive_users_count
FROM
(
SELECT
user_id
FROM
/* PLEASE REPLACE WITH YOUR TABLE NAME */
`YOUR_TABLE.events_*` AS T
CROSS JOIN
T.event_params
WHERE
event_params.key = 'engagement_time_msec' AND event_params.value.int_value > 0
/* Has engaged in last M = 7 days */
AND event_timestamp >
UNIX_MICROS(TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 7 DAY))
/* PLEASE REPLACE WITH YOUR DESIRED DATE RANGE */
AND _TABLE_SUFFIX BETWEEN '20180521' AND '20240131'
) AS MDaysUsers
-- EXCEPT ALL is not yet implemented in BigQuery. Use LEFT JOIN in the interim.
LEFT JOIN
(
SELECT
user_id
FROM
/* PLEASE REPLACE WITH YOUR TABLE NAME */
`YOUR_TABLE.events_*`AS T
CROSS JOIN
T.event_params
WHERE
event_params.key = 'engagement_time_msec' AND event_params.value.int_value > 0
/* Has engaged in last N = 2 days */
AND event_timestamp >
UNIX_MICROS(TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 2 DAY))
/* PLEASE REPLACE WITH YOUR DESIRED DATE RANGE */
AND _TABLE_SUFFIX BETWEEN '20180521' AND '20240131'
) AS NDaysUsers
ON MDaysUsers.user_id = NDaysUsers.user_id
WHERE
NDaysUsers.user_id IS NULL;
但是,当用 user_pseudo_id 替换 user_id 时,YOUR_TABLE 值用 bigquery-public-data.ga4_obfuscated_sample_ecommerce 并更正我的日期没有结果。
有没有办法将此代码应用于示例数据集?
我能看到的是,您必须删除 _TABLE_SUFFIX
行,因为 bigquery-public-data.ga4_obfuscated_sample_ecommerce
的数据集只有一个事件 table。
我所做的是获取最大时间戳日期,将其替换为 CURRENT_TIMESTAMP() 以更正日期。
SELECT MAX(TIMESTAMP_MICROS(event_timestamp)) as time
FROM `bigquery-public-data.ga4_obfuscated_sample_ecommerce.events_*
之后,我更改了您所说的值,将 user_id
替换为 user_pseudo_id
,并更正了时间和 table 名称。
考虑以下方法:
SELECT
COUNT(DISTINCT MDaysUsers.user_pseudo_id) AS n_day_inactive_users_count
FROM
(
SELECT
user_pseudo_id
FROM
/* PLEASE REPLACE WITH YOUR TABLE NAME */
`bigquery-public-data.ga4_obfuscated_sample_ecommerce.events_*` AS T
CROSS JOIN
T.event_params
WHERE
event_params.key = 'engagement_time_msec' AND event_params.value.int_value > 0
/* Has engaged in last M = 7 days */
AND event_timestamp >
UNIX_MICROS(TIMESTAMP_SUB("2021-01-31 23:59:55.412363", INTERVAL 7 DAY))
/* PLEASE REPLACE WITH YOUR DESIRED DATE RANGE */
) AS MDaysUsers
-- EXCEPT ALL is not yet implemented in BigQuery. Use LEFT JOIN in the interim.
LEFT JOIN
(
SELECT
user_pseudo_id
FROM
/* PLEASE REPLACE WITH YOUR TABLE NAME */
`bigquery-public-data.ga4_obfuscated_sample_ecommerce.events_*`AS T
CROSS JOIN
T.event_params
WHERE
event_params.key = 'engagement_time_msec' AND event_params.value.int_value > 0
/* Has engaged in last N = 2 days */
AND event_timestamp >
UNIX_MICROS(TIMESTAMP_SUB("2021-01-31 23:59:55.412363", INTERVAL 2 DAY))
/* PLEASE REPLACE WITH YOUR DESIRED DATE RANGE */
) AS NDaysUsers
ON MDaysUsers.user_pseudo_id = NDaysUsers.user_pseudo_id
WHERE
NDaysUsers.user_pseudo_id IS NULL;
它显示的结果是下一个: