Top-N By Decade 连续几十年(在 SQL 服务器中)
Top-N By Decade for successive decades (in SQL Server)
我正在尝试获取 Top 5
(即最常见的)文档标题的排名列表,按十年分组,对于最近 6 个十年中的每一个。
文档标题是non-unique。在任何给定的日历年中,可能会有数十个甚至数百个具有相同标题的文档。
以下查询是我所能做的。它给了我前 5 名的头衔,但仅限于 'all others' 十年。
我如何修改查询以获取其他每个十年的前 5 个标题?
SELECT
Top 5 documentTitle AS 'Title',
RANK() OVER (PARTITION BY calendarYear ORDER BY COUNT(documentTitle) DESC) AS Rank,
COUNT(tblDocumentFact.inventionTitleEnID) AS 'Number of Occurrences',
CASE
WHEN calendarYear BETWEEN 2010 AND 2019 THEN '2010 - 2019'
WHEN calendarYear BETWEEN 2000 AND 2009 THEN '2000 - 2009'
WHEN calendarYear BETWEEN 1990 AND 1999 THEN '1990 - 1999'
WHEN calendarYear BETWEEN 1980 AND 1989 THEN '1980 - 1989'
WHEN calendarYear BETWEEN 1970 AND 1979 THEN '1970 - 1979'
WHEN calendarYear BETWEEN 1960 AND 1969 THEN '1960 - 1969'
ELSE 'all others'
END AS Decade
FROM tbldocumentTitleDimension
INNER JOIN tblDocumentFact ON tbldocumentTitleDimension.documentTitleID = tblDocumentFact.documentTitleID
INNER JOIN tblDateDimension ON tblDocumentFact.publicationDateID = tblDateDimension.dateID
GROUP BY documentTitle,
calendarYear
ORDER BY [Number of Occurrences] DESC
如果我没听错,你想每十年排名前 5。如果是:
您需要 group by
十年而不是日历年才能获得正确的计数;在子查询中计算十年更容易,因此您不必重复 case
表达式
排名应该根据 decade
个分区而不是每年计算
然后您可以使用该列在外部查询中进行过滤
考虑:
select *
from (
select
dtd.documenttitle as title,
rank() over (partition by dd.decade order by count(*) desc) as rnk,
count(*) as number_of_occurrences,
dd.decade
from tbldocumentTitleDimension dtd
inner join tblDocumentFact df on dtd.documenttitleid = df.documenttitleid
inner join (
select
dateid,
case
when calendarYear between 2010 and 2019 then '2010 - 2019'
when calendarYear between 2000 and 2009 then '2000 - 2009'
when calendarYear between 1990 and 1999 then '1990 - 1999'
when calendarYear between 1980 and 1989 then '1980 - 1989'
when calendarYear between 1970 and 1979 then '1970 - 1979'
when calendarYear between 1960 and 1969 then '1960 - 1969'
else 'all others'
end AS decade
from tblDateDimension
) dd on df.publicationdateid = dd.dateid
group by dtd.documenttitle, dd.decade
) t
where rnk <= 5
order by decade, number_of_occurrences desc
旁注:
不要对标识符使用单引号(虽然 SQL 服务器允许,单引号应该保留用于垃圾字符串,如 SQL 标准中所定义) -更好的是,您可以使用不需要引用的标识符
在多table查询中,总是用它们所属的table限定所有列名;我在这里做了一些假设
除非您不想将 documentTitle
列中的 null
值计入,否则您可以使用 count(*)
而不是 count(documentTitle)
- 这很直接,而且效率更高
我正在尝试获取 Top 5
(即最常见的)文档标题的排名列表,按十年分组,对于最近 6 个十年中的每一个。
文档标题是non-unique。在任何给定的日历年中,可能会有数十个甚至数百个具有相同标题的文档。
以下查询是我所能做的。它给了我前 5 名的头衔,但仅限于 'all others' 十年。
我如何修改查询以获取其他每个十年的前 5 个标题?
SELECT
Top 5 documentTitle AS 'Title',
RANK() OVER (PARTITION BY calendarYear ORDER BY COUNT(documentTitle) DESC) AS Rank,
COUNT(tblDocumentFact.inventionTitleEnID) AS 'Number of Occurrences',
CASE
WHEN calendarYear BETWEEN 2010 AND 2019 THEN '2010 - 2019'
WHEN calendarYear BETWEEN 2000 AND 2009 THEN '2000 - 2009'
WHEN calendarYear BETWEEN 1990 AND 1999 THEN '1990 - 1999'
WHEN calendarYear BETWEEN 1980 AND 1989 THEN '1980 - 1989'
WHEN calendarYear BETWEEN 1970 AND 1979 THEN '1970 - 1979'
WHEN calendarYear BETWEEN 1960 AND 1969 THEN '1960 - 1969'
ELSE 'all others'
END AS Decade
FROM tbldocumentTitleDimension
INNER JOIN tblDocumentFact ON tbldocumentTitleDimension.documentTitleID = tblDocumentFact.documentTitleID
INNER JOIN tblDateDimension ON tblDocumentFact.publicationDateID = tblDateDimension.dateID
GROUP BY documentTitle,
calendarYear
ORDER BY [Number of Occurrences] DESC
如果我没听错,你想每十年排名前 5。如果是:
您需要
group by
十年而不是日历年才能获得正确的计数;在子查询中计算十年更容易,因此您不必重复case
表达式排名应该根据
decade
个分区而不是每年计算然后您可以使用该列在外部查询中进行过滤
考虑:
select *
from (
select
dtd.documenttitle as title,
rank() over (partition by dd.decade order by count(*) desc) as rnk,
count(*) as number_of_occurrences,
dd.decade
from tbldocumentTitleDimension dtd
inner join tblDocumentFact df on dtd.documenttitleid = df.documenttitleid
inner join (
select
dateid,
case
when calendarYear between 2010 and 2019 then '2010 - 2019'
when calendarYear between 2000 and 2009 then '2000 - 2009'
when calendarYear between 1990 and 1999 then '1990 - 1999'
when calendarYear between 1980 and 1989 then '1980 - 1989'
when calendarYear between 1970 and 1979 then '1970 - 1979'
when calendarYear between 1960 and 1969 then '1960 - 1969'
else 'all others'
end AS decade
from tblDateDimension
) dd on df.publicationdateid = dd.dateid
group by dtd.documenttitle, dd.decade
) t
where rnk <= 5
order by decade, number_of_occurrences desc
旁注:
不要对标识符使用单引号(虽然 SQL 服务器允许,单引号应该保留用于垃圾字符串,如 SQL 标准中所定义) -更好的是,您可以使用不需要引用的标识符
在多table查询中,总是用它们所属的table限定所有列名;我在这里做了一些假设
除非您不想将
documentTitle
列中的null
值计入,否则您可以使用count(*)
而不是count(documentTitle)
- 这很直接,而且效率更高