获得每组得分的前 5 行

Getting the Top 5 rows by score for each group

我正在尝试为每个 Reddit post 获取得分最高的 5 条评论。我只想检索每个 post 标题的得分最高的 N 条评论。

示例:我只想要评论 1 和评论 2 post。

Post 1 | Comment 1 | Comment Score 10
Post 1 | Comment 2 | Comment Score 9
Post 1 | Comment 3 | Comment Score 8
Post 2 | Comment 1 | Comment Score 10
Post 2 | Comment 2 | Comment Score 9
Post 2 | Comment 3 | Comment Score 8

标准SQL

SELECT 
    posts.title, 
    posts.url, 
    posts.score AS postsscore, 
    DATE_TRUNC(DATE(TIMESTAMP_SECONDS(posts.created_utc)), MONTH), 
    SUBSTR(comments.body, 0, 80), 
    comments.score AS commentsscore, 
    comments.id
FROM 
    `fh-bigquery.reddit_posts.2015*` AS posts
    JOIN `fh-bigquery.reddit_comments.2015*` AS comments
        ON posts.id = SUBSTR(comments.link_id, 4)
WHERE 
    posts.subreddit = 'Showerthoughts' 
    AND posts.score >100 
    AND comments.score >100
ORDER BY 
    posts.score DESC, 
    posts.title DESC, 
    comments.score DESC

以下适用于 BigQuery 标准 SQL

#standardSQL
SELECT * EXCEPT(pos) FROM (
  SELECT 
    posts.title, 
    posts.url, 
    posts.score AS postsscore, 
    DATE_TRUNC(DATE(TIMESTAMP_SECONDS(posts.created_utc)), MONTH), 
    SUBSTR(comments.body, 0, 80), 
    comments.score AS commentsscore, 
    comments.id,
    ROW_NUMBER() OVER(PARTITION BY posts.url ORDER BY comments.score DESC) pos
  FROM `fh-bigquery.reddit_posts.2015*` AS posts
  JOIN `fh-bigquery.reddit_comments.2015*` AS comments
  ON posts.id = SUBSTR(comments.link_id, 4)
  WHERE posts.subreddit = 'Showerthoughts' 
  AND posts.score >100 
  AND comments.score >100
) 
WHERE pos < 3
ORDER BY postsscore DESC, title DESC, commentsscore DESC