在大查询中连接多个表
Joining multiple tables in big query
我想在 BigQuery 中加入多个 table,但是 的解决方案没有帮助我获得我想要的输出。
我的出发点如下。我正在创建 5 个单独的 tables,它们显示特定页面可能的每个评级值。请在此处查看示例输出:
raw tables
table 是通过以下方式创建的:
#standardSQL
CREATE TEMPORARY FUNCTION tables_in_range(suffix STRING) AS (suffix BETWEEN (
SELECT
FORMAT_DATE('%y%m%d',
DATE('2018-06-01')))
AND (
SELECT
FORMAT_DATE('%y%m%d',
DATE('2018-06-30'))));
SELECT
h.page.pagePath AS page,
Count(h.eventInfo.eventLabel)as five_star
FROM
`table.ga_sessions_20*` AS t,
t.hits AS h
WHERE
h.eventInfo.eventAction='rating'
AND h.eventInfo.eventLabel ='5'
AND tables_in_range(_TABLE_SUFFIX)
AND REGEXP_CONTAINS(h.page.pagePath,
r'/xyz/')
AND h.type='EVENT'
group by 1
按照此处所述加入 table 时 很遗憾,我没有得到预期的结果。
相反,Join 仅查看所有 5 个 table 共有的页面 - 这意味着这些页面在 1-5 的五个可能值中的每一个中都有评级。请参阅下面的示例输出。
joint table results
select
five_star.page as page,
five_star.five_star as five_star,
four_star.four_star as four_star,
three_star.three_star as three_star,
two_star.two_star as two_star,
one_star.one_star as one_star
from five_star
join four_star using (page)
join three_star using (page)
join two_star using (page)
JOIN one_star using (page)
我想通过我的加入实现的是 table 这样的:
desired output。
我看到的问题是,如果一个页面没有收到某个评级,它将不会加入查询 atm。不幸的是,我无法找到 Union all、Cross Join 或 left join 的解决方案,所以我非常感谢这里的任何支持!
以下适用于 BigQuery 标准 SQL
#standardSQL
SELECT
page,
SUM(five_star_rating) five_star_rating,
SUM(four_star_rating) four_star_rating,
SUM(three_star_rating) three_star_rating,
SUM(two_star_rating) two_star_rating,
SUM(one_star_rating) one_star_rating
FROM (
SELECT page, 0 one_star_rating, 0 two_star_rating, 0 three_star_rating, 0 four_star_rating, five_star_rating FROM `project.dataset.table5` UNION ALL
SELECT page, 0, 0, 0, four_star_rating, 0 FROM `project.dataset.table4` UNION ALL
SELECT page, 0, 0, three_star_rating, 0, 0 FROM `project.dataset.table3` UNION ALL
SELECT page, 0, two_star_rating, 0, 0, 0 FROM `project.dataset.table2` UNION ALL
SELECT page, one_star_rating, 0, 0, 0, 0 FROM `project.dataset.table1`
)
GROUP BY page
您可以使用您问题中的虚拟数据进行测试,如下所示
#standardSQL
WITH `project.dataset.table5` AS (
SELECT 'A' page, 1 five_star_rating UNION ALL
SELECT 'B', 1 UNION ALL
SELECT 'C', 1
), `project.dataset.table4` AS (
SELECT 'C' page, 1 four_star_rating UNION ALL
SELECT 'D', 1 UNION ALL
SELECT 'F', 1
), `project.dataset.table3` AS (
SELECT 'F' page, 1 three_star_rating UNION ALL
SELECT 'G', 1 UNION ALL
SELECT 'H', 1
), `project.dataset.table2` AS (
SELECT 'H' page, 1 two_star_rating UNION ALL
SELECT 'I', 1 UNION ALL
SELECT 'J', 1
), `project.dataset.table1` AS (
SELECT 'J' page, 1 one_star_rating UNION ALL
SELECT 'K', 1 UNION ALL
SELECT 'L', 1
)
SELECT
page,
SUM(five_star_rating) five_star_rating,
SUM(four_star_rating) four_star_rating,
SUM(three_star_rating) three_star_rating,
SUM(two_star_rating) two_star_rating,
SUM(one_star_rating) one_star_rating
FROM (
SELECT page, 0 one_star_rating, 0 two_star_rating, 0 three_star_rating, 0 four_star_rating, five_star_rating FROM `project.dataset.table5` UNION ALL
SELECT page, 0, 0, 0, four_star_rating, 0 FROM `project.dataset.table4` UNION ALL
SELECT page, 0, 0, three_star_rating, 0, 0 FROM `project.dataset.table3` UNION ALL
SELECT page, 0, two_star_rating, 0, 0, 0 FROM `project.dataset.table2` UNION ALL
SELECT page, one_star_rating, 0, 0, 0, 0 FROM `project.dataset.table1`
)
GROUP BY page
Unfortunately, I was not able to find a solution with Union all, Cross Join or left join ...
另一种选择是使用 FULL JOIN,如下例所示
#standardSQL
SELECT
COALESCE(five_star.page, four_star.page, three_star.page, two_star.page, one_star.page) AS page,
IFNULL(five_star.five_star_rating, 0) AS five_star,
IFNULL(four_star.four_star_rating, 0) AS four_star,
IFNULL(three_star.three_star_rating, 0) AS three_star,
IFNULL(two_star.two_star_rating, 0) AS two_star,
IFNULL(one_star.one_star_rating, 0) AS one_star
FROM `project.dataset.table5` five_star
FULL JOIN `project.dataset.table4` four_star USING (page)
FULL JOIN `project.dataset.table3` three_star USING (page)
FULL JOIN `project.dataset.table2` two_star USING (page)
FULL JOIN `project.dataset.table1` one_star USING (page)
您可以使用您问题中的虚拟数据进行测试,如下所示
#standardSQL
WITH `project.dataset.table5` AS (
SELECT 'A' page, 1 five_star_rating UNION ALL
SELECT 'B', 1 UNION ALL
SELECT 'C', 1
), `project.dataset.table4` AS (
SELECT 'C' page, 1 four_star_rating UNION ALL
SELECT 'D', 1 UNION ALL
SELECT 'F', 1
), `project.dataset.table3` AS (
SELECT 'F' page, 1 three_star_rating UNION ALL
SELECT 'G', 1 UNION ALL
SELECT 'H', 1
), `project.dataset.table2` AS (
SELECT 'H' page, 1 two_star_rating UNION ALL
SELECT 'I', 1 UNION ALL
SELECT 'J', 1
), `project.dataset.table1` AS (
SELECT 'J' page, 1 one_star_rating UNION ALL
SELECT 'K', 1 UNION ALL
SELECT 'L', 1
)
SELECT
COALESCE(five_star.page, four_star.page, three_star.page, two_star.page, one_star.page) AS page,
IFNULL(five_star.five_star_rating, 0) AS five_star,
IFNULL(four_star.four_star_rating, 0) AS four_star,
IFNULL(three_star.three_star_rating, 0) AS three_star,
IFNULL(two_star.two_star_rating, 0) AS two_star,
IFNULL(one_star.one_star_rating, 0) AS one_star
FROM `project.dataset.table5` five_star
FULL JOIN `project.dataset.table4` four_star USING (page)
FULL JOIN `project.dataset.table3` three_star USING (page)
FULL JOIN `project.dataset.table2` two_star USING (page)
FULL JOIN `project.dataset.table1` one_star USING (page)
结果符合预期:
Row page five_star four_star three_star two_star one_star
1 A 1 0 0 0 1
2 B 1 0 0 0 1
3 C 1 1 0 0 1
4 D 0 1 0 0 0
5 F 0 1 1 0 0
6 G 0 0 1 0 0
7 H 0 0 1 1 0
8 I 0 0 0 1 0
9 J 0 0 0 1 0
您查询中的问题:您只添加到那些具有 5 星评级的活动页面。这就是为什么 full outer join 被推荐的原因——它将新行添加到最左边 table.
我认为在您的情况下,解决方案要简单得多,而且根本不需要连接,因为所有数据都在同一个 table 中。
这个是扁平的,没有枢轴:
#standardSQL
CREATE TEMPORARY FUNCTION tables_in_range(suffix STRING) AS (suffix BETWEEN '20180601' AND '20180630');
SELECT
h.page.pagePath AS page,
h.eventInfo.eventLabel stars,
COUNT(1) as events
FROM
`project.dataset.ga_sessions_*` AS t, t.hits AS h
WHERE
h.eventInfo.eventAction='rating'
AND h.eventInfo.eventLabel between '1' and '5'
AND tables_in_range(_TABLE_SUFFIX)
AND REGEXP_CONTAINS(h.page.pagePath,
r'/xyz/')
AND h.type='EVENT'
GROUP BY 1, 2
如果你真的需要类似数据透视表的列,它看起来像这样:
#standardSQL
CREATE TEMPORARY FUNCTION tables_in_range(suffix STRING) AS (suffix BETWEEN '20180601' AND '20180630');
SELECT
h.page.pagePath AS page,
SUM( IF(h.eventInfo.eventLabel = '1', 1, 0) ) as oneStarEvents,
SUM( IF(h.eventInfo.eventLabel = '2', 1, 0) ) as twoStarEvents,
SUM( IF(h.eventInfo.eventLabel = '3', 1, 0) ) as threeStarEvents,
SUM( IF(h.eventInfo.eventLabel = '4', 1, 0) ) as fourStarEvents,
SUM( IF(h.eventInfo.eventLabel = '5', 1, 0) ) as fiveStarEvents
FROM
`project.dataset.ga_sessions_*` AS t, t.hits AS h
WHERE
h.eventInfo.eventAction='rating'
AND h.eventInfo.eventLabel between '1' and '5'
AND tables_in_range(_TABLE_SUFFIX)
AND REGEXP_CONTAINS(h.page.pagePath,
r'/xyz/')
AND h.type='EVENT'
GROUP BY 1
除了SUM(IF(condition,1,0))
你还可以COUNT(IF(condition,1,NULL))
第一个!
我想在 BigQuery 中加入多个 table,但是
我的出发点如下。我正在创建 5 个单独的 tables,它们显示特定页面可能的每个评级值。请在此处查看示例输出:
raw tables
table 是通过以下方式创建的:
#standardSQL
CREATE TEMPORARY FUNCTION tables_in_range(suffix STRING) AS (suffix BETWEEN (
SELECT
FORMAT_DATE('%y%m%d',
DATE('2018-06-01')))
AND (
SELECT
FORMAT_DATE('%y%m%d',
DATE('2018-06-30'))));
SELECT
h.page.pagePath AS page,
Count(h.eventInfo.eventLabel)as five_star
FROM
`table.ga_sessions_20*` AS t,
t.hits AS h
WHERE
h.eventInfo.eventAction='rating'
AND h.eventInfo.eventLabel ='5'
AND tables_in_range(_TABLE_SUFFIX)
AND REGEXP_CONTAINS(h.page.pagePath,
r'/xyz/')
AND h.type='EVENT'
group by 1
按照此处所述加入 table 时
select
five_star.page as page,
five_star.five_star as five_star,
four_star.four_star as four_star,
three_star.three_star as three_star,
two_star.two_star as two_star,
one_star.one_star as one_star
from five_star
join four_star using (page)
join three_star using (page)
join two_star using (page)
JOIN one_star using (page)
我想通过我的加入实现的是 table 这样的: desired output。 我看到的问题是,如果一个页面没有收到某个评级,它将不会加入查询 atm。不幸的是,我无法找到 Union all、Cross Join 或 left join 的解决方案,所以我非常感谢这里的任何支持!
以下适用于 BigQuery 标准 SQL
#standardSQL
SELECT
page,
SUM(five_star_rating) five_star_rating,
SUM(four_star_rating) four_star_rating,
SUM(three_star_rating) three_star_rating,
SUM(two_star_rating) two_star_rating,
SUM(one_star_rating) one_star_rating
FROM (
SELECT page, 0 one_star_rating, 0 two_star_rating, 0 three_star_rating, 0 four_star_rating, five_star_rating FROM `project.dataset.table5` UNION ALL
SELECT page, 0, 0, 0, four_star_rating, 0 FROM `project.dataset.table4` UNION ALL
SELECT page, 0, 0, three_star_rating, 0, 0 FROM `project.dataset.table3` UNION ALL
SELECT page, 0, two_star_rating, 0, 0, 0 FROM `project.dataset.table2` UNION ALL
SELECT page, one_star_rating, 0, 0, 0, 0 FROM `project.dataset.table1`
)
GROUP BY page
您可以使用您问题中的虚拟数据进行测试,如下所示
#standardSQL
WITH `project.dataset.table5` AS (
SELECT 'A' page, 1 five_star_rating UNION ALL
SELECT 'B', 1 UNION ALL
SELECT 'C', 1
), `project.dataset.table4` AS (
SELECT 'C' page, 1 four_star_rating UNION ALL
SELECT 'D', 1 UNION ALL
SELECT 'F', 1
), `project.dataset.table3` AS (
SELECT 'F' page, 1 three_star_rating UNION ALL
SELECT 'G', 1 UNION ALL
SELECT 'H', 1
), `project.dataset.table2` AS (
SELECT 'H' page, 1 two_star_rating UNION ALL
SELECT 'I', 1 UNION ALL
SELECT 'J', 1
), `project.dataset.table1` AS (
SELECT 'J' page, 1 one_star_rating UNION ALL
SELECT 'K', 1 UNION ALL
SELECT 'L', 1
)
SELECT
page,
SUM(five_star_rating) five_star_rating,
SUM(four_star_rating) four_star_rating,
SUM(three_star_rating) three_star_rating,
SUM(two_star_rating) two_star_rating,
SUM(one_star_rating) one_star_rating
FROM (
SELECT page, 0 one_star_rating, 0 two_star_rating, 0 three_star_rating, 0 four_star_rating, five_star_rating FROM `project.dataset.table5` UNION ALL
SELECT page, 0, 0, 0, four_star_rating, 0 FROM `project.dataset.table4` UNION ALL
SELECT page, 0, 0, three_star_rating, 0, 0 FROM `project.dataset.table3` UNION ALL
SELECT page, 0, two_star_rating, 0, 0, 0 FROM `project.dataset.table2` UNION ALL
SELECT page, one_star_rating, 0, 0, 0, 0 FROM `project.dataset.table1`
)
GROUP BY page
Unfortunately, I was not able to find a solution with Union all, Cross Join or left join ...
另一种选择是使用 FULL JOIN,如下例所示
#standardSQL
SELECT
COALESCE(five_star.page, four_star.page, three_star.page, two_star.page, one_star.page) AS page,
IFNULL(five_star.five_star_rating, 0) AS five_star,
IFNULL(four_star.four_star_rating, 0) AS four_star,
IFNULL(three_star.three_star_rating, 0) AS three_star,
IFNULL(two_star.two_star_rating, 0) AS two_star,
IFNULL(one_star.one_star_rating, 0) AS one_star
FROM `project.dataset.table5` five_star
FULL JOIN `project.dataset.table4` four_star USING (page)
FULL JOIN `project.dataset.table3` three_star USING (page)
FULL JOIN `project.dataset.table2` two_star USING (page)
FULL JOIN `project.dataset.table1` one_star USING (page)
您可以使用您问题中的虚拟数据进行测试,如下所示
#standardSQL
WITH `project.dataset.table5` AS (
SELECT 'A' page, 1 five_star_rating UNION ALL
SELECT 'B', 1 UNION ALL
SELECT 'C', 1
), `project.dataset.table4` AS (
SELECT 'C' page, 1 four_star_rating UNION ALL
SELECT 'D', 1 UNION ALL
SELECT 'F', 1
), `project.dataset.table3` AS (
SELECT 'F' page, 1 three_star_rating UNION ALL
SELECT 'G', 1 UNION ALL
SELECT 'H', 1
), `project.dataset.table2` AS (
SELECT 'H' page, 1 two_star_rating UNION ALL
SELECT 'I', 1 UNION ALL
SELECT 'J', 1
), `project.dataset.table1` AS (
SELECT 'J' page, 1 one_star_rating UNION ALL
SELECT 'K', 1 UNION ALL
SELECT 'L', 1
)
SELECT
COALESCE(five_star.page, four_star.page, three_star.page, two_star.page, one_star.page) AS page,
IFNULL(five_star.five_star_rating, 0) AS five_star,
IFNULL(four_star.four_star_rating, 0) AS four_star,
IFNULL(three_star.three_star_rating, 0) AS three_star,
IFNULL(two_star.two_star_rating, 0) AS two_star,
IFNULL(one_star.one_star_rating, 0) AS one_star
FROM `project.dataset.table5` five_star
FULL JOIN `project.dataset.table4` four_star USING (page)
FULL JOIN `project.dataset.table3` three_star USING (page)
FULL JOIN `project.dataset.table2` two_star USING (page)
FULL JOIN `project.dataset.table1` one_star USING (page)
结果符合预期:
Row page five_star four_star three_star two_star one_star
1 A 1 0 0 0 1
2 B 1 0 0 0 1
3 C 1 1 0 0 1
4 D 0 1 0 0 0
5 F 0 1 1 0 0
6 G 0 0 1 0 0
7 H 0 0 1 1 0
8 I 0 0 0 1 0
9 J 0 0 0 1 0
您查询中的问题:您只添加到那些具有 5 星评级的活动页面。这就是为什么 full outer join 被推荐的原因——它将新行添加到最左边 table.
我认为在您的情况下,解决方案要简单得多,而且根本不需要连接,因为所有数据都在同一个 table 中。 这个是扁平的,没有枢轴:
#standardSQL
CREATE TEMPORARY FUNCTION tables_in_range(suffix STRING) AS (suffix BETWEEN '20180601' AND '20180630');
SELECT
h.page.pagePath AS page,
h.eventInfo.eventLabel stars,
COUNT(1) as events
FROM
`project.dataset.ga_sessions_*` AS t, t.hits AS h
WHERE
h.eventInfo.eventAction='rating'
AND h.eventInfo.eventLabel between '1' and '5'
AND tables_in_range(_TABLE_SUFFIX)
AND REGEXP_CONTAINS(h.page.pagePath,
r'/xyz/')
AND h.type='EVENT'
GROUP BY 1, 2
如果你真的需要类似数据透视表的列,它看起来像这样:
#standardSQL
CREATE TEMPORARY FUNCTION tables_in_range(suffix STRING) AS (suffix BETWEEN '20180601' AND '20180630');
SELECT
h.page.pagePath AS page,
SUM( IF(h.eventInfo.eventLabel = '1', 1, 0) ) as oneStarEvents,
SUM( IF(h.eventInfo.eventLabel = '2', 1, 0) ) as twoStarEvents,
SUM( IF(h.eventInfo.eventLabel = '3', 1, 0) ) as threeStarEvents,
SUM( IF(h.eventInfo.eventLabel = '4', 1, 0) ) as fourStarEvents,
SUM( IF(h.eventInfo.eventLabel = '5', 1, 0) ) as fiveStarEvents
FROM
`project.dataset.ga_sessions_*` AS t, t.hits AS h
WHERE
h.eventInfo.eventAction='rating'
AND h.eventInfo.eventLabel between '1' and '5'
AND tables_in_range(_TABLE_SUFFIX)
AND REGEXP_CONTAINS(h.page.pagePath,
r'/xyz/')
AND h.type='EVENT'
GROUP BY 1
除了SUM(IF(condition,1,0))
你还可以COUNT(IF(condition,1,NULL))
第一个!