sql 仅对连续行进行排名
sql rank only continuous rows
我有一个查询,其中我根据 3 列对行进行排名。
我这样做是成功的,除了如果任何行在这 3 列中包含相同的数据,即使它在输出中不连续,它也会给它下一个排名。我希望如果任何行与这些列中的数据匹配,则只有在连续行中才应给予下一个排名,如果不是,则应再次将其排名为 1。
我尝试了以下代码:
SELECT DISTINCT DENSE_RANK () OVER (PARTITION BY Patient_ID,
Opnametype,
afdelingscode ORDER BY Patient_ID, Opnamenummer, SPECIALISMEN, Opnametype, OntslagDatumTijd) AS rnk,
*
FROM t_opnames
ORDER BY Patient_ID, Opnamenummer, SPECIALISMEN, Opnametype, OntslagDatumTijd
输出为:
rnk Opnamenummer Patient_ID afdelingscode Opnametype Specialismen OntslagDatumTijd ...
1 2983800 100006 RD8-GH MAU Inpatient-E GM 2014-09-01 14:50:00.000
2 2983800 100006 RD8-GH MAU Inpatient-E GM 2014-09-02 19:32:00.000
1 2983800 100006 RD8-GH Ward 08 Inpatient-E GM 2014-09-03 17:12:00.000
1 2983800 100006 RD8-GH Endo Inpatient-E GM 2014-09-09 09:06:00.000
2 2983800 100006 RD8-GH Ward 08 Inpatient-E GM 2014-09-17 17:00:00.000
3 2983800 100006 RD8-GH Ward 08 Inpatient-E GM 2014-10-01 17:15:00.000
因此,除最后两行外,所有行都是正确的。我希望它们的排名为 1 和 2 而不是 2 和 3,因为带有 "RD8-GH Endo" 的行在它们之间。
那我该怎么做呢?
您可以通过关联子查询来实现。使用类似这样的东西
DECLARE @t_opnames TABLE
(
Opnamenummer INT,
Patient_ID INT,
afdelingscode VARCHAR(100),
Opnametype VARCHAR(100),
Specialismen CHAR(2),
OntslagDatumTijd DATETIME
)
Insert into @t_opnames
SELECT 2983800 ,100006, 'RD8-GH MAU', 'Inpatient-E', 'GM', '2014-09-01 14:50:00.000'
UNION ALL SELECT 2983800 ,100006, 'RD8-GH MAU', 'Inpatient-E', 'GM', '2014-09-02 19:32:00.000'
UNION ALL SELECT 2983800 ,100006, 'RD8-GH Ward 08', 'Inpatient-E', 'GM', '2014-09-03 17:12:00.000'
UNION ALL SELECT 2983800 ,100006, 'RD8-GH Endo', 'Inpatient-E', 'GM', '2014-09-09 09:06:00.000'
UNION ALL SELECT 2983800 ,100006, 'RD8-GH Ward 08', 'Inpatient-E', 'GM', '2014-09-17 17:00:00.000'
UNION ALL SELECT 2983800 ,100006, 'RD8-GH Ward 08', 'Inpatient-E', 'GM', '2014-10-01 17:15:00.000'
;WITH CTE as
(
SELECT DENSE_RANK() OVER(ORDER BY Patient_ID, Opnamenummer, SPECIALISMEN, Opnametype, OntslagDatumTijd) rnk,*
FROM @t_opnames
)
SELECT rnk-ISNULL((
SELECT MAX(rnk)
FROM CTE c2
WHERE c2.Opnamenummer <= c1.Opnamenummer
AND c2.SPECIALISMEN <= c1.SPECIALISMEN
AND c2.OntslagDatumTijd <= c1.OntslagDatumTijd
AND c2.rnk < c1.rnk
AND (c2.Patient_ID <> c1.Patient_ID
OR c2.Opnametype <> c1.Opnametype
OR c2.afdelingscode <> c1.afdelingscode)),0) rnk,Patient_ID, Opnametype,afdelingscode,Opnamenummer, SPECIALISMEN, OntslagDatumTijd
FROM CTE c1
ORDER BY Patient_ID, Opnamenummer, SPECIALISMEN, Opnametype, OntslagDatumTijd
这并没有直接回答问题,但我的目的是解释为什么你正在尝试的东西没有像你预期的那样工作。
您的问题是由 PARTITION
引起的。如果从 PARTITION
子句中删除非唯一列,则剩下 afdelingscode
。所以简单来说,您的 PARTITION
正在对数据进行分组,如下所示:
RD8-GH Endo
RD8-GH MAU
RD8-GH MAU
RD8-GH Ward 08
RD8-GH Ward 08
RD8-GH Ward 08
ORDER BY
子句决定了您的 PARTITION
中的顺序,因此再次删除非唯一列会为您提供 ORDER BY OntslagDatumTijd
,它会产生这个,它按日期列排序,请注意分区仍由 afdelingscode
:
分隔
afdelingscode OntslagDatumTijd
RD8-GH Endo 2014-09-09 09:06:00.000
RD8-GH MAU 2014-09-01 14:50:00.000
RD8-GH MAU 2014-09-02 19:32:00.000
RD8-GH Ward 08 2014-09-03 17:12:00.000
RD8-GH Ward 08 2014-09-17 17:00:00.000
RD8-GH Ward 08 2014-10-01 17:15:00.000
然后将排名应用于这些分区。其输出变为:
rnk afdelingscode OntslagDatumTijd
1 RD8-GH Endo 2014-09-09 09:06:00.000
1 RD8-GH MAU 2014-09-01 14:50:00.000
2 RD8-GH MAU 2014-09-02 19:32:00.000
1 RD8-GH Ward 08 2014-09-03 17:12:00.000
2 RD8-GH Ward 08 2014-09-17 17:00:00.000
3 RD8-GH Ward 08 2014-10-01 17:15:00.000
所以它是根据您指定的方式进行排名的,输出中的问题是因为在您的 select 末尾(取出非唯一列)按日期列排序 OntslagDatumTijd
,它给你:
rnk afdelingscode OntslagDatumTijd
1 RD8-GH MAU 2014-09-01 14:50:00.000
2 RD8-GH MAU 2014-09-02 19:32:00.000
1 RD8-GH Ward 08 2014-09-03 17:12:00.000
1 RD8-GH Endo 2014-09-09 09:06:00.000
2 RD8-GH Ward 08 2014-09-17 17:00:00.000
3 RD8-GH Ward 08 2014-10-01 17:15:00.000
如果发布的其他答案不符合您的要求,我会继续查看。
参考:
PARTITION BY Divides the query result set into partitions. The window
function is applied to each partition separately and computation
restarts for each partition.
ORDER BY clause Defines the logical order of the rows within each
partition of the result set. That is, it specifies the logical order
in which the window functioncalculation is performed.
这是一个潜在的解决方案,可能会因您使用的数据大小而出现性能问题,但您可以对其进行测试:
-- sets up your dummy data
CREATE TABLE #t_opnames
(
Opnamenummer INT ,
Patient_ID INT ,
afdelingscode NVARCHAR(20) ,
Opnametype NVARCHAR(20) ,
Specialismen NVARCHAR(20) ,
OntslagDatumTijd DATETIME
);
INSERT INTO #t_opnames
( Opnamenummer, Patient_ID, afdelingscode, Opnametype, Specialismen,
OntslagDatumTijd )
VALUES ( 2983800, 100006, 'RD8-GH MAU', 'Inpatient-E', 'GM',
'2014-09-01 14:50:00.000' ),
( 2983800, 100006, 'RD8-GH MAU', 'Inpatient-E', 'GM',
'2014-09-02 19:32:00.000' ),
( 2983800, 100006, 'RD8-GH Ward 08', 'Inpatient-E', 'GM',
'2014-09-03 17:12:00.000' ),
( 2983800, 100006, 'RD8-GH Endo', 'Inpatient-E', 'GM',
'2014-09-09 09:06:00.000' ),
( 2983800, 100006, 'RD8-GH Ward 08', 'Inpatient-E', 'GM',
'2014-09-17 17:00:00.000' ),
( 2983800, 100006, 'RD8-GH Ward 08', 'Inpatient-E', 'GM',
'2014-10-01 17:15:00.000' )
-- I've added a row number to your data to enable iteration over the data
SELECT ROW_NUMBER() OVER ( ORDER BY OntslagDatumTijd ) AS rn ,
*
INTO #temp
FROM #t_opnames
ORDER BY OntslagDatumTijd
-- this will iterate over the rows and apply the rankings
;WITH cte AS (
SELECT *, 1 AS rnk
FROM #temp
WHERE rn = 1
UNION ALL
SELECT t.*, CASE WHEN cte.afdelingscode = t.afdelingscode
THEN cte.rnk + 1
ELSE 1
END AS rnk
FROM #temp t
INNER JOIN cte ON cte.rn +1 = t.rn
)
SELECT * FROM cte
DROP TABLE #t_opnames
DROP TABLE #temp
您将达到更大数据集的 MAXRECURSION
限制,为此您需要在最终 SELECT
之后使用以下内容修改限制:
SELECT * FROM cte
OPTION (MAXRECURSION 0)
将此值设置为 0
不会施加任何限制,如果您事先知道,可以将此数字设置为数据集的大小。
我终于得到了我的查询的解决方案,现在我得到了我想要的输出,并且也在 3 秒内 运行 超过 75k+ 行。我使用的代码是:
SELECT DISTINCT ROW_NUMBER () OVER (ORDER BY Patient_ID, Opnamenummer, SPECIALISMEN, Opnametype, OntslagDatumTijd) AS rownum,
* INTO #temp
FROM t_opnames
ORDER BY Patient_ID, Opnamenummer, SPECIALISMEN, Opnametype, OntslagDatumTijd;
WITH CTE
AS (SELECT *,
ROW_NUMBER () OVER (ORDER BY rownum) - ROW_NUMBER () OVER (PARTITION BY Patient_ID,
Opnametype,
afdelingscode ORDER BY rownum) AS RowGroup
FROM #temp)
SELECT ROW_NUMBER () OVER (PARTITION BY RowGroup,
Patient_ID,
Opnametype,
afdelingscode ORDER BY rownum) AS GroupSequence,
*
FROM CTE
ORDER BY rownum;
DROP TABLE #temp;
我参考了在此 page
上发布的示例
我有一个查询,其中我根据 3 列对行进行排名。 我这样做是成功的,除了如果任何行在这 3 列中包含相同的数据,即使它在输出中不连续,它也会给它下一个排名。我希望如果任何行与这些列中的数据匹配,则只有在连续行中才应给予下一个排名,如果不是,则应再次将其排名为 1。 我尝试了以下代码:
SELECT DISTINCT DENSE_RANK () OVER (PARTITION BY Patient_ID,
Opnametype,
afdelingscode ORDER BY Patient_ID, Opnamenummer, SPECIALISMEN, Opnametype, OntslagDatumTijd) AS rnk,
*
FROM t_opnames
ORDER BY Patient_ID, Opnamenummer, SPECIALISMEN, Opnametype, OntslagDatumTijd
输出为:
rnk Opnamenummer Patient_ID afdelingscode Opnametype Specialismen OntslagDatumTijd ...
1 2983800 100006 RD8-GH MAU Inpatient-E GM 2014-09-01 14:50:00.000
2 2983800 100006 RD8-GH MAU Inpatient-E GM 2014-09-02 19:32:00.000
1 2983800 100006 RD8-GH Ward 08 Inpatient-E GM 2014-09-03 17:12:00.000
1 2983800 100006 RD8-GH Endo Inpatient-E GM 2014-09-09 09:06:00.000
2 2983800 100006 RD8-GH Ward 08 Inpatient-E GM 2014-09-17 17:00:00.000
3 2983800 100006 RD8-GH Ward 08 Inpatient-E GM 2014-10-01 17:15:00.000
因此,除最后两行外,所有行都是正确的。我希望它们的排名为 1 和 2 而不是 2 和 3,因为带有 "RD8-GH Endo" 的行在它们之间。 那我该怎么做呢?
您可以通过关联子查询来实现。使用类似这样的东西
DECLARE @t_opnames TABLE
(
Opnamenummer INT,
Patient_ID INT,
afdelingscode VARCHAR(100),
Opnametype VARCHAR(100),
Specialismen CHAR(2),
OntslagDatumTijd DATETIME
)
Insert into @t_opnames
SELECT 2983800 ,100006, 'RD8-GH MAU', 'Inpatient-E', 'GM', '2014-09-01 14:50:00.000'
UNION ALL SELECT 2983800 ,100006, 'RD8-GH MAU', 'Inpatient-E', 'GM', '2014-09-02 19:32:00.000'
UNION ALL SELECT 2983800 ,100006, 'RD8-GH Ward 08', 'Inpatient-E', 'GM', '2014-09-03 17:12:00.000'
UNION ALL SELECT 2983800 ,100006, 'RD8-GH Endo', 'Inpatient-E', 'GM', '2014-09-09 09:06:00.000'
UNION ALL SELECT 2983800 ,100006, 'RD8-GH Ward 08', 'Inpatient-E', 'GM', '2014-09-17 17:00:00.000'
UNION ALL SELECT 2983800 ,100006, 'RD8-GH Ward 08', 'Inpatient-E', 'GM', '2014-10-01 17:15:00.000'
;WITH CTE as
(
SELECT DENSE_RANK() OVER(ORDER BY Patient_ID, Opnamenummer, SPECIALISMEN, Opnametype, OntslagDatumTijd) rnk,*
FROM @t_opnames
)
SELECT rnk-ISNULL((
SELECT MAX(rnk)
FROM CTE c2
WHERE c2.Opnamenummer <= c1.Opnamenummer
AND c2.SPECIALISMEN <= c1.SPECIALISMEN
AND c2.OntslagDatumTijd <= c1.OntslagDatumTijd
AND c2.rnk < c1.rnk
AND (c2.Patient_ID <> c1.Patient_ID
OR c2.Opnametype <> c1.Opnametype
OR c2.afdelingscode <> c1.afdelingscode)),0) rnk,Patient_ID, Opnametype,afdelingscode,Opnamenummer, SPECIALISMEN, OntslagDatumTijd
FROM CTE c1
ORDER BY Patient_ID, Opnamenummer, SPECIALISMEN, Opnametype, OntslagDatumTijd
这并没有直接回答问题,但我的目的是解释为什么你正在尝试的东西没有像你预期的那样工作。
您的问题是由 PARTITION
引起的。如果从 PARTITION
子句中删除非唯一列,则剩下 afdelingscode
。所以简单来说,您的 PARTITION
正在对数据进行分组,如下所示:
RD8-GH Endo
RD8-GH MAU
RD8-GH MAU
RD8-GH Ward 08
RD8-GH Ward 08
RD8-GH Ward 08
ORDER BY
子句决定了您的 PARTITION
中的顺序,因此再次删除非唯一列会为您提供 ORDER BY OntslagDatumTijd
,它会产生这个,它按日期列排序,请注意分区仍由 afdelingscode
:
afdelingscode OntslagDatumTijd
RD8-GH Endo 2014-09-09 09:06:00.000
RD8-GH MAU 2014-09-01 14:50:00.000
RD8-GH MAU 2014-09-02 19:32:00.000
RD8-GH Ward 08 2014-09-03 17:12:00.000
RD8-GH Ward 08 2014-09-17 17:00:00.000
RD8-GH Ward 08 2014-10-01 17:15:00.000
然后将排名应用于这些分区。其输出变为:
rnk afdelingscode OntslagDatumTijd
1 RD8-GH Endo 2014-09-09 09:06:00.000
1 RD8-GH MAU 2014-09-01 14:50:00.000
2 RD8-GH MAU 2014-09-02 19:32:00.000
1 RD8-GH Ward 08 2014-09-03 17:12:00.000
2 RD8-GH Ward 08 2014-09-17 17:00:00.000
3 RD8-GH Ward 08 2014-10-01 17:15:00.000
所以它是根据您指定的方式进行排名的,输出中的问题是因为在您的 select 末尾(取出非唯一列)按日期列排序 OntslagDatumTijd
,它给你:
rnk afdelingscode OntslagDatumTijd
1 RD8-GH MAU 2014-09-01 14:50:00.000
2 RD8-GH MAU 2014-09-02 19:32:00.000
1 RD8-GH Ward 08 2014-09-03 17:12:00.000
1 RD8-GH Endo 2014-09-09 09:06:00.000
2 RD8-GH Ward 08 2014-09-17 17:00:00.000
3 RD8-GH Ward 08 2014-10-01 17:15:00.000
如果发布的其他答案不符合您的要求,我会继续查看。
参考:
PARTITION BY Divides the query result set into partitions. The window function is applied to each partition separately and computation restarts for each partition.
ORDER BY clause Defines the logical order of the rows within each partition of the result set. That is, it specifies the logical order in which the window functioncalculation is performed.
这是一个潜在的解决方案,可能会因您使用的数据大小而出现性能问题,但您可以对其进行测试:
-- sets up your dummy data
CREATE TABLE #t_opnames
(
Opnamenummer INT ,
Patient_ID INT ,
afdelingscode NVARCHAR(20) ,
Opnametype NVARCHAR(20) ,
Specialismen NVARCHAR(20) ,
OntslagDatumTijd DATETIME
);
INSERT INTO #t_opnames
( Opnamenummer, Patient_ID, afdelingscode, Opnametype, Specialismen,
OntslagDatumTijd )
VALUES ( 2983800, 100006, 'RD8-GH MAU', 'Inpatient-E', 'GM',
'2014-09-01 14:50:00.000' ),
( 2983800, 100006, 'RD8-GH MAU', 'Inpatient-E', 'GM',
'2014-09-02 19:32:00.000' ),
( 2983800, 100006, 'RD8-GH Ward 08', 'Inpatient-E', 'GM',
'2014-09-03 17:12:00.000' ),
( 2983800, 100006, 'RD8-GH Endo', 'Inpatient-E', 'GM',
'2014-09-09 09:06:00.000' ),
( 2983800, 100006, 'RD8-GH Ward 08', 'Inpatient-E', 'GM',
'2014-09-17 17:00:00.000' ),
( 2983800, 100006, 'RD8-GH Ward 08', 'Inpatient-E', 'GM',
'2014-10-01 17:15:00.000' )
-- I've added a row number to your data to enable iteration over the data
SELECT ROW_NUMBER() OVER ( ORDER BY OntslagDatumTijd ) AS rn ,
*
INTO #temp
FROM #t_opnames
ORDER BY OntslagDatumTijd
-- this will iterate over the rows and apply the rankings
;WITH cte AS (
SELECT *, 1 AS rnk
FROM #temp
WHERE rn = 1
UNION ALL
SELECT t.*, CASE WHEN cte.afdelingscode = t.afdelingscode
THEN cte.rnk + 1
ELSE 1
END AS rnk
FROM #temp t
INNER JOIN cte ON cte.rn +1 = t.rn
)
SELECT * FROM cte
DROP TABLE #t_opnames
DROP TABLE #temp
您将达到更大数据集的 MAXRECURSION
限制,为此您需要在最终 SELECT
之后使用以下内容修改限制:
SELECT * FROM cte
OPTION (MAXRECURSION 0)
将此值设置为 0
不会施加任何限制,如果您事先知道,可以将此数字设置为数据集的大小。
我终于得到了我的查询的解决方案,现在我得到了我想要的输出,并且也在 3 秒内 运行 超过 75k+ 行。我使用的代码是:
SELECT DISTINCT ROW_NUMBER () OVER (ORDER BY Patient_ID, Opnamenummer, SPECIALISMEN, Opnametype, OntslagDatumTijd) AS rownum,
* INTO #temp
FROM t_opnames
ORDER BY Patient_ID, Opnamenummer, SPECIALISMEN, Opnametype, OntslagDatumTijd;
WITH CTE
AS (SELECT *,
ROW_NUMBER () OVER (ORDER BY rownum) - ROW_NUMBER () OVER (PARTITION BY Patient_ID,
Opnametype,
afdelingscode ORDER BY rownum) AS RowGroup
FROM #temp)
SELECT ROW_NUMBER () OVER (PARTITION BY RowGroup,
Patient_ID,
Opnametype,
afdelingscode ORDER BY rownum) AS GroupSequence,
*
FROM CTE
ORDER BY rownum;
DROP TABLE #temp;
我参考了在此 page
上发布的示例