如何对 Stack Exchange Data Explorer (SEDE) 结果进行分页?

How to paginate Stack Exchange Data Explorer (SEDE) results?

使用 data explorer 创建查询:

SELECT P.id, creationdate,tags,owneruserid,answercount
--SELECT DISTINCT TAGNAME ,TAGID
FROM TAGS  AS T
JOIN POSTTAGS AS PT
ON T.ID = PT.TAGID
JOIN POSTS AS P
ON PT.POSTID = P.ID
--WHERE CAST(P.TAGS AS VARCHAR) IN('JAVA')
WHERE PT.TAGID = 3143

如何在查询中添加分页以便不仅获取前 50,000 个结果,而且 运行 再次查询以获取下一个剩余结果?

有几种方法可以 "page" 通过 TSQL 结果;见:

  • How to return a page of results from SQL?
  • SQL performance: WHERE vs WHERE(ROW_NUMBER)

这里我将使用CTE方法为:

  • 它使用方便的行号对结果进行分页,而不是试图跟踪较难预测的因素,例如 creationdate
  • 据说它比 OFFSET 方法执行得更快。

因此,该问题的查询变为 this SEDE query:

-- StartRow: Starting row for paging
-- EndRow: Ending row for paging (Max 50K rows at a time)
WITH allData AS (
    SELECT
                ROW_NUMBER() OVER (ORDER BY P.creationdate) AS row
                , P.id
                , P.creationdate
                , P.tags
                , P.owneruserid
                , P.answercount
    FROM        Posttags    AS PT
    JOIN        Posts       AS P    ON PT.postid = P.id
    WHERE       PT.tagid    = 3143  -- tag [scala]
)
SELECT      *
FROM        allData
WHERE       row    >= ##StartRow:INT?1##
AND         row    <= ##EndRow:INT?50000##
ORDER BY    row