使用子查询连接两个数据集

Question

我正在尝试使用 BigQuery 连接两个大型数据集。他们有一个公共字段，但是公共字段在每个数据集中有不同的名称。

我想计算行数并对表 1 和表 2 的案例逻辑结果求和。

我认为我遇到了由子查询（子查询？）和语法错误引起的错误。我试图应用类似帖子中的先例，但我似乎仍然遗漏了一些东西。非常感谢任何帮助进行排序的帮助。

SELECT
table1.field1,
table1.field2,
    (
    SELECT COUNT (*)
    FROM table1) AS table1_total,
sum(case when table1.mutually_exclusive_metric1 = "Y" then 1 else 0 end) AS t1_pass_1,
sum(case when table1.mutually_exclusive_metric1 = "Y" AND table1.mutually_exclusive_metric2 IS null OR table1.mutually_exclusive_metric3 = 'Y' then 1 else 0 end) AS t1_pass_2, 
sum(case when table1.mutually_exclusive_metric3 ="Y" AND table1.mutually_exclusive_metric2 ="Y" AND table1.mutually_exclusive_metric3 ="Y" then 1 else 0 end) AS  t1_pass_3,
    (
    SELECT COUNT (*)
    FROM table2) AS table2_total,
sum(case when table2.metric1 IS true then 1 else 0 end) AS t2_pass_1,
sum(case when table2.metric2 IS true then 1 else 0 end) AS t2_pass_2,
    (
        SELECT COUNT (*)
        FROM dataset1.table1 JOIN EACH dataset2.table2 ON common_field_table1 =  common_field_table2) AS overlap 
FROM
dataset1.table1,
dataset2.table2
WHERE
XYZ

提前致谢！

Answer 1

嘘。让我们一步一个脚印：
1) 使用 * 不是显式的，显式是好的。此外，显式声明 selects 和 * 将复制带有自动重命名的 selects。 table1.field 将变为 table1_field。除非你只是在玩，否则不要使用 *.

2) 您从未加入。带有连接的查询如下所示（注意 WHERE 和 GROUP 语句的顺序，注意每个语句的命名）：

SELECT
  t1.field1 AS field1,
  t2.field2 AS field2
FROM dataset1.table1 AS t1

JOIN dataset2.table2 AS t2
ON t1.field1 = t2.field1

WHERE t1.field1 = "some value"

GROUP BY field1, field2

其中 t1.f1 = t2.f1 包含相应的值。您不会在 select.

中重复这些内容

3) 使用空格使您的代码更易于阅读。它可以帮助所有相关人员，包括您。

4) 你的 subselect 很没用。使用 subselect 而不是创建新的 table。例如，您可以使用 subselect 对现有 table 中的数据进行分组或过滤。例如：

SELECT
  subselect.field1 AS ssf1,
  subselect.max_f1 AS ss_max_f1
FROM (
    SELECT
        t1.field1 AS field1,
        MAX(t1.field1) AS max_f1,
    FROM dataset1.table1 AS t1

    GROUP BY field1
) AS subselect

subselect 实际上是您 select 来自的新 table。从逻辑上对待它，就像它首先发生一样，然后从中获取结果并将其用于您的主要 select.

5) 这是一个糟糕的问题。它甚至看起来不像你试图一次一步地解决问题。

使用子查询连接两个数据集

Joining two datasets with subqueries

google-bigquery