SQL - 不等左连接 BigQuery
SQL - Unequal left join BigQuery
这里是新的。随着时间的推移,我正在尝试获取每日和每周的活跃用户。他们有 30 天的时间才会被视为不活跃。我的目标是创建可以按 user_id 拆分的图表,以显示同类群组、地区、类别等。
我已经创建了一个日期 table 来获取该时间段的每一天,并且我有简化的订单 table 以及我需要计算它的基本信息。
我正在尝试使用以下 SQL 查询执行 Left Join 以按日期获取状态:
WITH daily_use AS (
SELECT
__key__.id AS user_id
, DATE_TRUNC(date(placeOrderDate), day) AS activity_date
FROM `analysis.Order`
where isBuyingGroupOrder = TRUE
AND testOrder = FALSE
GROUP BY 1, 2
),
dates AS (
SELECT DATE_ADD(DATE "2016-01-01", INTERVAL d.d DAY) AS date
FROM
(
SELECT ROW_NUMBER() OVER(ORDER BY __key__.id) -1 AS d
FROM `analysis.Order`
ORDER BY __key__.id
LIMIT 1096
) AS d
ORDER BY 1 DESC
)
SELECT
daily_use.user_id
, wd.date AS date
, MIN(DATE_DIFF(wd.date, daily_use.activity_date, DAY)) AS days_since_last_action
FROM dates AS wd
LEFT JOIN daily_use
ON wd.date >= daily_use.activity_date
AND wd.date < DATE_ADD(daily_use.activity_date, INTERVAL 30 DAY)
GROUP BY 1,2
我收到此错误:如果连接两侧的字段相等,则无法使用 LEFT OUTER JOIN。在 BigQuery 中,我想知道如何解决这个问题。我在 BigQuery 中使用标准 SQL。
谢谢
以下是针对 BigQuery Standard SQL 的,主要是在您的查询中重现逻辑,但不包括根本没有 activity 的日子
#standardSQL
SELECT
daily_use.user_id
, wd.date AS DATE
, MIN(DATE_DIFF(wd.date, daily_use.activity_date, DAY)) AS days_since_last_action
FROM dates AS wd
CROSS JOIN daily_use
WHERE wd.date BETWEEN
daily_use.activity_date AND DATE_ADD(daily_use.activity_date, INTERVAL 30 DAY)
GROUP BY 1,2
-- ORDER BY 1,2
如果出于某种原因您仍然需要 exactly
重现您的逻辑 - 您可以在上面使用最终左连接,如下所示:
#standardSQL
SELECT *
FROM dates AS wd
LEFT JOIN (
SELECT
daily_use.user_id
, wd.date AS date
, MIN(DATE_DIFF(wd.date, daily_use.activity_date, DAY)) AS days_since_last_action
FROM dates AS wd
CROSS JOIN daily_use
WHERE wd.date BETWEEN
daily_use.activity_date AND DATE_ADD(daily_use.activity_date, INTERVAL 30 DAY)
GROUP BY 1,2
) AS daily_use
USING (date)
-- ORDER BY 1,2
这里是新的。随着时间的推移,我正在尝试获取每日和每周的活跃用户。他们有 30 天的时间才会被视为不活跃。我的目标是创建可以按 user_id 拆分的图表,以显示同类群组、地区、类别等。
我已经创建了一个日期 table 来获取该时间段的每一天,并且我有简化的订单 table 以及我需要计算它的基本信息。
我正在尝试使用以下 SQL 查询执行 Left Join 以按日期获取状态:
WITH daily_use AS (
SELECT
__key__.id AS user_id
, DATE_TRUNC(date(placeOrderDate), day) AS activity_date
FROM `analysis.Order`
where isBuyingGroupOrder = TRUE
AND testOrder = FALSE
GROUP BY 1, 2
),
dates AS (
SELECT DATE_ADD(DATE "2016-01-01", INTERVAL d.d DAY) AS date
FROM
(
SELECT ROW_NUMBER() OVER(ORDER BY __key__.id) -1 AS d
FROM `analysis.Order`
ORDER BY __key__.id
LIMIT 1096
) AS d
ORDER BY 1 DESC
)
SELECT
daily_use.user_id
, wd.date AS date
, MIN(DATE_DIFF(wd.date, daily_use.activity_date, DAY)) AS days_since_last_action
FROM dates AS wd
LEFT JOIN daily_use
ON wd.date >= daily_use.activity_date
AND wd.date < DATE_ADD(daily_use.activity_date, INTERVAL 30 DAY)
GROUP BY 1,2
我收到此错误:如果连接两侧的字段相等,则无法使用 LEFT OUTER JOIN。在 BigQuery 中,我想知道如何解决这个问题。我在 BigQuery 中使用标准 SQL。
谢谢
以下是针对 BigQuery Standard SQL 的,主要是在您的查询中重现逻辑,但不包括根本没有 activity 的日子
#standardSQL
SELECT
daily_use.user_id
, wd.date AS DATE
, MIN(DATE_DIFF(wd.date, daily_use.activity_date, DAY)) AS days_since_last_action
FROM dates AS wd
CROSS JOIN daily_use
WHERE wd.date BETWEEN
daily_use.activity_date AND DATE_ADD(daily_use.activity_date, INTERVAL 30 DAY)
GROUP BY 1,2
-- ORDER BY 1,2
如果出于某种原因您仍然需要 exactly
重现您的逻辑 - 您可以在上面使用最终左连接,如下所示:
#standardSQL
SELECT *
FROM dates AS wd
LEFT JOIN (
SELECT
daily_use.user_id
, wd.date AS date
, MIN(DATE_DIFF(wd.date, daily_use.activity_date, DAY)) AS days_since_last_action
FROM dates AS wd
CROSS JOIN daily_use
WHERE wd.date BETWEEN
daily_use.activity_date AND DATE_ADD(daily_use.activity_date, INTERVAL 30 DAY)
GROUP BY 1,2
) AS daily_use
USING (date)
-- ORDER BY 1,2