使用 AS 和 ON 子句连接多个 Reddit 表时遇到问题

Having trouble joining multiple Reddit tables with an AS and ON clause

我正在尝试将评论加入多个 table 的帖子。我需要一个 AS 子句,因为帖子 table 和评论 table 共享一个列 'score'。

我的目标是能够使用所有这些 table 中的数据在热门帖子中找到热门评论。

#standardSQL
SELECT posts.title, posts.url, posts.score AS postsscore, 
DATE_TRUNC(DATE(TIMESTAMP_SECONDS(posts.created_utc)), MONTH), 
comments.body, comments.score AS commentsscore, comments.id

FROM

fh-bigquery.reddit_posts.2015_12, fh-bigquery.reddit_posts.2016_01, fh-bigquery.reddit_posts.2016_02, fh-bigquery.reddit_posts.2016_03, fh-bigquery.reddit_posts.2016_04, fh-bigquery.reddit_posts.2016_05, fh-bigquery.reddit_posts.2016_06, fh-bigquery.reddit_posts.2016_07, fh-bigquery.reddit_posts.2016_08, fh-bigquery.reddit_posts.2016_09, fh-bigquery.reddit_posts.2016_10, fh-bigquery.reddit_posts.2016_11, fh-bigquery.reddit_posts.2016_12, fh-bigquery.reddit_posts.2017_01, fh-bigquery.reddit_posts.2017_02, fh-bigquery.reddit_posts.2017_03, fh-bigquery.reddit_posts.2017_04, fh-bigquery.reddit_posts.2017_05, fh-bigquery.reddit_posts.2017_06, fh-bigquery.reddit_posts.2017_07, fh-bigquery.reddit_posts.2017_08, fh-bigquery.reddit_posts.2017_09, fh-bigquery.reddit_posts.2017_10, fh-bigquery.reddit_posts.2017_11, fh-bigquery.reddit_posts.2017_12, fh-bigquery.reddit_posts.2018_01, fh-bigquery.reddit_posts.2018_02, fh-bigquery.reddit_posts.2018_03, fh-bigquery.reddit_posts.2018_04, fh-bigquery.reddit_posts.2018_05, fh-bigquery.reddit_posts.2018_06, fh-bigquery.reddit_posts.2018_07, fh-bigquery.reddit_posts.2018_08, fh-bigquery.reddit_posts.2018_09, fh-bigquery.reddit_posts.2018_10

AS posts

JOIN

fh-bigquery.reddit_comments.2015_12, fh-bigquery.reddit_comments.2016_01, fh-bigquery.reddit_comments.2016_02, fh-bigquery.reddit_comments.2016_03, fh-bigquery.reddit_comments.2016_04, fh-bigquery.reddit_comments.2016_05, fh-bigquery.reddit_comments.2016_06, fh-bigquery.reddit_comments.2016_07, fh-bigquery.reddit_comments.2016_08, fh-bigquery.reddit_comments.2016_09, fh-bigquery.reddit_comments.2016_10, fh-bigquery.reddit_comments.2016_11, fh-bigquery.reddit_comments.2016_12, fh-bigquery.reddit_comments.2017_01, fh-bigquery.reddit_comments.2017_02, fh-bigquery.reddit_comments.2017_03, fh-bigquery.reddit_comments.2017_04, fh-bigquery.reddit_comments.2017_05, fh-bigquery.reddit_comments.2017_06, fh-bigquery.reddit_comments.2017_07, fh-bigquery.reddit_comments.2017_08, fh-bigquery.reddit_comments.2017_09, fh-bigquery.reddit_comments.2017_10, fh-bigquery.reddit_comments.2017_11, fh-bigquery.reddit_comments.2017_12, fh-bigquery.reddit_comments.2018_01, fh-bigquery.reddit_comments.2018_02, fh-bigquery.reddit_comments.2018_03, fh-bigquery.reddit_comments.2018_04, fh-bigquery.reddit_comments.2018_05, fh-bigquery.reddit_comments.2018_06, fh-bigquery.reddit_comments.2018_07, fh-bigquery.reddit_comments.2018_08, fh-bigquery.reddit_comments.2018_09, fh-bigquery.reddit_comments.2018_10

AS comments

ON posts.id = SUBSTR(comments.link_id, 4)

WHERE posts.subreddit = 'Showerthoughts' AND posts.score >100 AND comments.score >100
ORDER BY posts.score DESC

我的目标是能够使用所有这些 table 中的数据在热门帖子中找到热门评论。

好的,所以这个查询的问题:

  • 小心!此查询将处理大量数据。我可以重新聚类 table 以提高这种方式的效率,但我还没有这样做。
  • 在#standardSQL 中,逗号表示 JOIN,而不是 UNION。所以你需要 UNION tables.
  • 快捷方式:您可以在 table 名称的末尾附加一个 * 以扩展到所有匹配的 table。
  • 使用反引号转义 table 名称。

话虽如此,一个有效的查询将是:

#standardSQL
SELECT posts.title, posts.url, posts.score AS postsscore, 
DATE_TRUNC(DATE(TIMESTAMP_SECONDS(posts.created_utc)), MONTH), 
SUBSTR(comments.body, 0, 80), comments.score AS commentsscore, comments.id

FROM `fh-bigquery.reddit_posts.2015*` AS posts
JOIN `fh-bigquery.reddit_comments.2015*` AS comments

ON posts.id = SUBSTR(comments.link_id, 4)

WHERE posts.subreddit = 'Showerthoughts' 
AND posts.score >100 
AND comments.score >100
ORDER BY posts.score DESC