使用 AS 和 ON 子句连接多个 Reddit 表时遇到问题
Having trouble joining multiple Reddit tables with an AS and ON clause
我正在尝试将评论加入多个 table 的帖子。我需要一个 AS 子句,因为帖子 table 和评论 table 共享一个列 'score'。
我的目标是能够使用所有这些 table 中的数据在热门帖子中找到热门评论。
#standardSQL
SELECT posts.title, posts.url, posts.score AS postsscore,
DATE_TRUNC(DATE(TIMESTAMP_SECONDS(posts.created_utc)), MONTH),
comments.body, comments.score AS commentsscore, comments.id
FROM
fh-bigquery.reddit_posts.2015_12
,
fh-bigquery.reddit_posts.2016_01
,
fh-bigquery.reddit_posts.2016_02
,
fh-bigquery.reddit_posts.2016_03
,
fh-bigquery.reddit_posts.2016_04
,
fh-bigquery.reddit_posts.2016_05
,
fh-bigquery.reddit_posts.2016_06
,
fh-bigquery.reddit_posts.2016_07
,
fh-bigquery.reddit_posts.2016_08
,
fh-bigquery.reddit_posts.2016_09
,
fh-bigquery.reddit_posts.2016_10
,
fh-bigquery.reddit_posts.2016_11
,
fh-bigquery.reddit_posts.2016_12
,
fh-bigquery.reddit_posts.2017_01
,
fh-bigquery.reddit_posts.2017_02
,
fh-bigquery.reddit_posts.2017_03
,
fh-bigquery.reddit_posts.2017_04
,
fh-bigquery.reddit_posts.2017_05
,
fh-bigquery.reddit_posts.2017_06
,
fh-bigquery.reddit_posts.2017_07
,
fh-bigquery.reddit_posts.2017_08
,
fh-bigquery.reddit_posts.2017_09
,
fh-bigquery.reddit_posts.2017_10
,
fh-bigquery.reddit_posts.2017_11
,
fh-bigquery.reddit_posts.2017_12
,
fh-bigquery.reddit_posts.2018_01
,
fh-bigquery.reddit_posts.2018_02
,
fh-bigquery.reddit_posts.2018_03
,
fh-bigquery.reddit_posts.2018_04
,
fh-bigquery.reddit_posts.2018_05
,
fh-bigquery.reddit_posts.2018_06
,
fh-bigquery.reddit_posts.2018_07
,
fh-bigquery.reddit_posts.2018_08
,
fh-bigquery.reddit_posts.2018_09
,
fh-bigquery.reddit_posts.2018_10
AS posts
JOIN
fh-bigquery.reddit_comments.2015_12
,
fh-bigquery.reddit_comments.2016_01
,
fh-bigquery.reddit_comments.2016_02
,
fh-bigquery.reddit_comments.2016_03
,
fh-bigquery.reddit_comments.2016_04
,
fh-bigquery.reddit_comments.2016_05
,
fh-bigquery.reddit_comments.2016_06
,
fh-bigquery.reddit_comments.2016_07
,
fh-bigquery.reddit_comments.2016_08
,
fh-bigquery.reddit_comments.2016_09
,
fh-bigquery.reddit_comments.2016_10
,
fh-bigquery.reddit_comments.2016_11
,
fh-bigquery.reddit_comments.2016_12
,
fh-bigquery.reddit_comments.2017_01
,
fh-bigquery.reddit_comments.2017_02
,
fh-bigquery.reddit_comments.2017_03
,
fh-bigquery.reddit_comments.2017_04
,
fh-bigquery.reddit_comments.2017_05
,
fh-bigquery.reddit_comments.2017_06
,
fh-bigquery.reddit_comments.2017_07
,
fh-bigquery.reddit_comments.2017_08
,
fh-bigquery.reddit_comments.2017_09
,
fh-bigquery.reddit_comments.2017_10
,
fh-bigquery.reddit_comments.2017_11
,
fh-bigquery.reddit_comments.2017_12
,
fh-bigquery.reddit_comments.2018_01
,
fh-bigquery.reddit_comments.2018_02
,
fh-bigquery.reddit_comments.2018_03
,
fh-bigquery.reddit_comments.2018_04
,
fh-bigquery.reddit_comments.2018_05
,
fh-bigquery.reddit_comments.2018_06
,
fh-bigquery.reddit_comments.2018_07
,
fh-bigquery.reddit_comments.2018_08
,
fh-bigquery.reddit_comments.2018_09
,
fh-bigquery.reddit_comments.2018_10
AS comments
ON posts.id = SUBSTR(comments.link_id, 4)
WHERE posts.subreddit = 'Showerthoughts' AND posts.score >100 AND comments.score >100
ORDER BY posts.score DESC
我的目标是能够使用所有这些 table 中的数据在热门帖子中找到热门评论。
好的,所以这个查询的问题:
- 小心!此查询将处理大量数据。我可以重新聚类 table 以提高这种方式的效率,但我还没有这样做。
- 在#standardSQL 中,逗号表示
JOIN
,而不是 UNION
。所以你需要 UNION
tables.
- 快捷方式:您可以在 table 名称的末尾附加一个
*
以扩展到所有匹配的 table。
- 使用反引号转义 table 名称。
话虽如此,一个有效的查询将是:
#standardSQL
SELECT posts.title, posts.url, posts.score AS postsscore,
DATE_TRUNC(DATE(TIMESTAMP_SECONDS(posts.created_utc)), MONTH),
SUBSTR(comments.body, 0, 80), comments.score AS commentsscore, comments.id
FROM `fh-bigquery.reddit_posts.2015*` AS posts
JOIN `fh-bigquery.reddit_comments.2015*` AS comments
ON posts.id = SUBSTR(comments.link_id, 4)
WHERE posts.subreddit = 'Showerthoughts'
AND posts.score >100
AND comments.score >100
ORDER BY posts.score DESC
我正在尝试将评论加入多个 table 的帖子。我需要一个 AS 子句,因为帖子 table 和评论 table 共享一个列 'score'。
我的目标是能够使用所有这些 table 中的数据在热门帖子中找到热门评论。
#standardSQL
SELECT posts.title, posts.url, posts.score AS postsscore,
DATE_TRUNC(DATE(TIMESTAMP_SECONDS(posts.created_utc)), MONTH),
comments.body, comments.score AS commentsscore, comments.id
FROM
fh-bigquery.reddit_posts.2015_12
,
fh-bigquery.reddit_posts.2016_01
,
fh-bigquery.reddit_posts.2016_02
,
fh-bigquery.reddit_posts.2016_03
,
fh-bigquery.reddit_posts.2016_04
,
fh-bigquery.reddit_posts.2016_05
,
fh-bigquery.reddit_posts.2016_06
,
fh-bigquery.reddit_posts.2016_07
,
fh-bigquery.reddit_posts.2016_08
,
fh-bigquery.reddit_posts.2016_09
,
fh-bigquery.reddit_posts.2016_10
,
fh-bigquery.reddit_posts.2016_11
,
fh-bigquery.reddit_posts.2016_12
,
fh-bigquery.reddit_posts.2017_01
,
fh-bigquery.reddit_posts.2017_02
,
fh-bigquery.reddit_posts.2017_03
,
fh-bigquery.reddit_posts.2017_04
,
fh-bigquery.reddit_posts.2017_05
,
fh-bigquery.reddit_posts.2017_06
,
fh-bigquery.reddit_posts.2017_07
,
fh-bigquery.reddit_posts.2017_08
,
fh-bigquery.reddit_posts.2017_09
,
fh-bigquery.reddit_posts.2017_10
,
fh-bigquery.reddit_posts.2017_11
,
fh-bigquery.reddit_posts.2017_12
,
fh-bigquery.reddit_posts.2018_01
,
fh-bigquery.reddit_posts.2018_02
,
fh-bigquery.reddit_posts.2018_03
,
fh-bigquery.reddit_posts.2018_04
,
fh-bigquery.reddit_posts.2018_05
,
fh-bigquery.reddit_posts.2018_06
,
fh-bigquery.reddit_posts.2018_07
,
fh-bigquery.reddit_posts.2018_08
,
fh-bigquery.reddit_posts.2018_09
,
fh-bigquery.reddit_posts.2018_10
AS posts
JOIN
fh-bigquery.reddit_comments.2015_12
,
fh-bigquery.reddit_comments.2016_01
,
fh-bigquery.reddit_comments.2016_02
,
fh-bigquery.reddit_comments.2016_03
,
fh-bigquery.reddit_comments.2016_04
,
fh-bigquery.reddit_comments.2016_05
,
fh-bigquery.reddit_comments.2016_06
,
fh-bigquery.reddit_comments.2016_07
,
fh-bigquery.reddit_comments.2016_08
,
fh-bigquery.reddit_comments.2016_09
,
fh-bigquery.reddit_comments.2016_10
,
fh-bigquery.reddit_comments.2016_11
,
fh-bigquery.reddit_comments.2016_12
,
fh-bigquery.reddit_comments.2017_01
,
fh-bigquery.reddit_comments.2017_02
,
fh-bigquery.reddit_comments.2017_03
,
fh-bigquery.reddit_comments.2017_04
,
fh-bigquery.reddit_comments.2017_05
,
fh-bigquery.reddit_comments.2017_06
,
fh-bigquery.reddit_comments.2017_07
,
fh-bigquery.reddit_comments.2017_08
,
fh-bigquery.reddit_comments.2017_09
,
fh-bigquery.reddit_comments.2017_10
,
fh-bigquery.reddit_comments.2017_11
,
fh-bigquery.reddit_comments.2017_12
,
fh-bigquery.reddit_comments.2018_01
,
fh-bigquery.reddit_comments.2018_02
,
fh-bigquery.reddit_comments.2018_03
,
fh-bigquery.reddit_comments.2018_04
,
fh-bigquery.reddit_comments.2018_05
,
fh-bigquery.reddit_comments.2018_06
,
fh-bigquery.reddit_comments.2018_07
,
fh-bigquery.reddit_comments.2018_08
,
fh-bigquery.reddit_comments.2018_09
,
fh-bigquery.reddit_comments.2018_10
AS comments
ON posts.id = SUBSTR(comments.link_id, 4)
WHERE posts.subreddit = 'Showerthoughts' AND posts.score >100 AND comments.score >100
ORDER BY posts.score DESC
我的目标是能够使用所有这些 table 中的数据在热门帖子中找到热门评论。
好的,所以这个查询的问题:
- 小心!此查询将处理大量数据。我可以重新聚类 table 以提高这种方式的效率,但我还没有这样做。
- 在#standardSQL 中,逗号表示
JOIN
,而不是UNION
。所以你需要UNION
tables. - 快捷方式:您可以在 table 名称的末尾附加一个
*
以扩展到所有匹配的 table。 - 使用反引号转义 table 名称。
话虽如此,一个有效的查询将是:
#standardSQL
SELECT posts.title, posts.url, posts.score AS postsscore,
DATE_TRUNC(DATE(TIMESTAMP_SECONDS(posts.created_utc)), MONTH),
SUBSTR(comments.body, 0, 80), comments.score AS commentsscore, comments.id
FROM `fh-bigquery.reddit_posts.2015*` AS posts
JOIN `fh-bigquery.reddit_comments.2015*` AS comments
ON posts.id = SUBSTR(comments.link_id, 4)
WHERE posts.subreddit = 'Showerthoughts'
AND posts.score >100
AND comments.score >100
ORDER BY posts.score DESC