BIGQUERY:用另一个 table 的用户 ID 替换一个 table 的 'null' 结果

BIGQUERY: Replace a 'null' result from one table with a user id from another table

值得信赖的 BQ 专家。

背景: 我有用户在网站上阅读文章(用户 Table A),用户通过电子邮件点击文章(用户 Table B),并且每个用户集都有 BQ 视图表。用户 Table A 缺少一些 user_id 从电子邮件中点击的用户。请参阅下面的用户 Table A。

*User Table A* - Website

id  | user_id    | article id  | viewed_at
------------------------------------------------------------------
1   | 1          | 1000        | 2019-01-25 01:04:00 UTC
2   | 2          | 1001        | 2019-01-25 01:03:00 UTC
3   | 3          | 1002        | 2019-01-25 01:03:00 UTC
4   | null       | 1001        | 2019-01-25 01:04:00 UTC
5   | null       | 1000        | 2019-01-24 20:49:00 UTC
6   | null       | 1003        | 2019-01-24 20:47:00 UTC


*User Table B* - Email

id  | user_id    | article id  | clicked_at
------------------------------------------------------------------
1   | 1          | 1000        | 2019-01-25 01:04:00 UTC
2   | 1          | 1000        | 2019-01-24 20:49:00 UTC
3   | 6          | 1003        | 2019-01-24 20:47:00 UTC

*Desired Result Table*

id  | user_id    | article id  | viewed_at
------------------------------------------------------------------
1   | 1          | 1000        | 2019-01-25 01:04:00 UTC
2   | 2          | 1001        | 2019-01-25 01:03:00 UTC
3   | 3          | 1002        | 2019-01-25 01:03:00 UTC
4   | null       | 1001        | 2019-01-25 01:04:00 UTC
5   | 1          | 1000        | 2019-01-24 20:49:00 UTC
6   | 6          | 1003        | 2019-01-24 20:47:00 UTC

我希望这是有道理的。

请帮忙。这几个月来一直困扰着我。

我想你可以使用 left join:

select w.id,
       coalesce(w.user_id, e.user_id) as user_id,
       w.article_id, w.viewed_at
from website w left join
     email e
     on w.article_id = e.article_id and
        w.viewed_at = e.viewed_at and
        w.user_id is null;

请注意,这种逻辑假定您在 email table 中没有关于 article_id/viewed_at 的重复项。

以下适用于 BigQuery 标准 SQL

#standardSQL
SELECT 
  a.id,
  IFNULL(a.user_id, b.user_id) user_id,
  a.article_id,
  viewed_at
FROM `project.dataset.website` a
LEFT JOIN `project.dataset.email` b
ON a.user_id IS NULL
AND a.article_id = b.article_id
AND viewed_at = clicked_at