如何对 Stack Exchange Data Explorer (SEDE) 结果进行分页?
How to paginate Stack Exchange Data Explorer (SEDE) results?
使用 data explorer 创建查询:
SELECT P.id, creationdate,tags,owneruserid,answercount
--SELECT DISTINCT TAGNAME ,TAGID
FROM TAGS AS T
JOIN POSTTAGS AS PT
ON T.ID = PT.TAGID
JOIN POSTS AS P
ON PT.POSTID = P.ID
--WHERE CAST(P.TAGS AS VARCHAR) IN('JAVA')
WHERE PT.TAGID = 3143
如何在查询中添加分页以便不仅获取前 50,000 个结果,而且 运行 再次查询以获取下一个剩余结果?
有几种方法可以 "page" 通过 TSQL 结果;见:
- How to return a page of results from SQL?
和
- SQL performance: WHERE vs WHERE(ROW_NUMBER)
这里我将使用CTE方法为:
- 它使用方便的行号对结果进行分页,而不是试图跟踪较难预测的因素,例如
creationdate
。
- 据说它比
OFFSET
方法执行得更快。
因此,该问题的查询变为 this SEDE query:
-- StartRow: Starting row for paging
-- EndRow: Ending row for paging (Max 50K rows at a time)
WITH allData AS (
SELECT
ROW_NUMBER() OVER (ORDER BY P.creationdate) AS row
, P.id
, P.creationdate
, P.tags
, P.owneruserid
, P.answercount
FROM Posttags AS PT
JOIN Posts AS P ON PT.postid = P.id
WHERE PT.tagid = 3143 -- tag [scala]
)
SELECT *
FROM allData
WHERE row >= ##StartRow:INT?1##
AND row <= ##EndRow:INT?50000##
ORDER BY row
使用 data explorer 创建查询:
SELECT P.id, creationdate,tags,owneruserid,answercount
--SELECT DISTINCT TAGNAME ,TAGID
FROM TAGS AS T
JOIN POSTTAGS AS PT
ON T.ID = PT.TAGID
JOIN POSTS AS P
ON PT.POSTID = P.ID
--WHERE CAST(P.TAGS AS VARCHAR) IN('JAVA')
WHERE PT.TAGID = 3143
如何在查询中添加分页以便不仅获取前 50,000 个结果,而且 运行 再次查询以获取下一个剩余结果?
有几种方法可以 "page" 通过 TSQL 结果;见:
- How to return a page of results from SQL?
和 - SQL performance: WHERE vs WHERE(ROW_NUMBER)
这里我将使用CTE方法为:
- 它使用方便的行号对结果进行分页,而不是试图跟踪较难预测的因素,例如
creationdate
。 - 据说它比
OFFSET
方法执行得更快。
因此,该问题的查询变为 this SEDE query:
-- StartRow: Starting row for paging
-- EndRow: Ending row for paging (Max 50K rows at a time)
WITH allData AS (
SELECT
ROW_NUMBER() OVER (ORDER BY P.creationdate) AS row
, P.id
, P.creationdate
, P.tags
, P.owneruserid
, P.answercount
FROM Posttags AS PT
JOIN Posts AS P ON PT.postid = P.id
WHERE PT.tagid = 3143 -- tag [scala]
)
SELECT *
FROM allData
WHERE row >= ##StartRow:INT?1##
AND row <= ##EndRow:INT?50000##
ORDER BY row