sql 仅对连续行进行排名

Question

我有一个查询，其中我根据 3 列对行进行排名。我这样做是成功的，除了如果任何行在这 3 列中包含相同的数据，即使它在输出中不连续，它也会给它下一个排名。我希望如果任何行与这些列中的数据匹配，则只有在连续行中才应给予下一个排名，如果不是，则应再次将其排名为 1。我尝试了以下代码：

  SELECT DISTINCT DENSE_RANK () OVER (PARTITION BY Patient_ID, 
                                                 Opnametype, 
                                                 afdelingscode ORDER BY  Patient_ID, Opnamenummer, SPECIALISMEN, Opnametype, OntslagDatumTijd) AS rnk, 
                *
  FROM t_opnames
  ORDER BY Patient_ID, Opnamenummer, SPECIALISMEN, Opnametype, OntslagDatumTijd

输出为：

rnk Opnamenummer Patient_ID afdelingscode     Opnametype   Specialismen  OntslagDatumTijd ...
1   2983800      100006     RD8-GH MAU        Inpatient-E  GM            2014-09-01 14:50:00.000
2   2983800      100006     RD8-GH MAU        Inpatient-E  GM            2014-09-02 19:32:00.000
1   2983800      100006     RD8-GH Ward 08    Inpatient-E  GM            2014-09-03 17:12:00.000  
1   2983800      100006     RD8-GH Endo       Inpatient-E  GM            2014-09-09 09:06:00.000
2   2983800      100006     RD8-GH Ward 08    Inpatient-E  GM            2014-09-17 17:00:00.000
3   2983800      100006     RD8-GH Ward 08    Inpatient-E  GM            2014-10-01 17:15:00.000

因此，除最后两行外，所有行都是正确的。我希望它们的排名为 1 和 2 而不是 2 和 3，因为带有 "RD8-GH Endo" 的行在它们之间。那我该怎么做呢？

Answer 1

您可以通过关联子查询来实现。使用类似这样的东西

DECLARE @t_opnames TABLE
(
    Opnamenummer INT,
    Patient_ID INT,
    afdelingscode     VARCHAR(100),
    Opnametype   VARCHAR(100),
    Specialismen  CHAR(2),
    OntslagDatumTijd DATETIME
)

Insert into @t_opnames
SELECT  2983800      ,100006,     'RD8-GH MAU',        'Inpatient-E',  'GM',            '2014-09-01 14:50:00.000'
UNION ALL SELECT 2983800      ,100006,     'RD8-GH MAU',        'Inpatient-E',  'GM',            '2014-09-02 19:32:00.000'
UNION ALL SELECT 2983800      ,100006,     'RD8-GH Ward 08',    'Inpatient-E',  'GM',            '2014-09-03 17:12:00.000'  
UNION ALL SELECT 2983800      ,100006,     'RD8-GH Endo',       'Inpatient-E',  'GM',            '2014-09-09 09:06:00.000'
UNION ALL SELECT 2983800      ,100006,     'RD8-GH Ward 08',    'Inpatient-E',  'GM',            '2014-09-17 17:00:00.000'
UNION ALL SELECT 2983800      ,100006,     'RD8-GH Ward 08',    'Inpatient-E',  'GM',            '2014-10-01 17:15:00.000'


;WITH CTE as 
(
SELECT DENSE_RANK() OVER(ORDER BY  Patient_ID, Opnamenummer, SPECIALISMEN, Opnametype, OntslagDatumTijd) rnk,* 
  FROM @t_opnames
)
SELECT rnk-ISNULL((
            SELECT MAX(rnk) 
            FROM CTE c2 
            WHERE c2.Opnamenummer <= c1.Opnamenummer
            AND c2.SPECIALISMEN <= c1.SPECIALISMEN
            AND c2.OntslagDatumTijd <= c1.OntslagDatumTijd
            AND c2.rnk < c1.rnk
            AND (c2.Patient_ID <> c1.Patient_ID 
                OR   c2.Opnametype <> c1.Opnametype 
                OR c2.afdelingscode <> c1.afdelingscode)),0) rnk,Patient_ID, Opnametype,afdelingscode,Opnamenummer, SPECIALISMEN, OntslagDatumTijd
FROM CTE c1
  ORDER BY Patient_ID, Opnamenummer, SPECIALISMEN, Opnametype, OntslagDatumTijd

Answer 2

这并没有直接回答问题，但我的目的是解释为什么你正在尝试的东西没有像你预期的那样工作。

您的问题是由 PARTITION 引起的。如果从 PARTITION 子句中删除非唯一列，则剩下 afdelingscode。所以简单来说，您的 PARTITION 正在对数据进行分组，如下所示：

RD8-GH Endo
RD8-GH MAU
RD8-GH MAU
RD8-GH Ward 08
RD8-GH Ward 08
RD8-GH Ward 08

ORDER BY 子句决定了您的 PARTITION 中的顺序，因此再次删除非唯一列会为您提供 ORDER BY OntslagDatumTijd，它会产生这个，它按日期列排序，请注意分区仍由 afdelingscode:

分隔

afdelingscode   OntslagDatumTijd
RD8-GH Endo     2014-09-09 09:06:00.000
RD8-GH MAU      2014-09-01 14:50:00.000
RD8-GH MAU      2014-09-02 19:32:00.000
RD8-GH Ward 08  2014-09-03 17:12:00.000
RD8-GH Ward 08  2014-09-17 17:00:00.000
RD8-GH Ward 08  2014-10-01 17:15:00.000

然后将排名应用于这些分区。其输出变为：

rnk afdelingscode   OntslagDatumTijd
1   RD8-GH Endo     2014-09-09 09:06:00.000
1   RD8-GH MAU      2014-09-01 14:50:00.000
2   RD8-GH MAU      2014-09-02 19:32:00.000
1   RD8-GH Ward 08  2014-09-03 17:12:00.000
2   RD8-GH Ward 08  2014-09-17 17:00:00.000
3   RD8-GH Ward 08  2014-10-01 17:15:00.000

所以它是根据您指定的方式进行排名的，输出中的问题是因为在您的 select 末尾（取出非唯一列）按日期列排序 OntslagDatumTijd，它给你：

rnk afdelingscode   OntslagDatumTijd
1   RD8-GH MAU      2014-09-01 14:50:00.000
2   RD8-GH MAU      2014-09-02 19:32:00.000
1   RD8-GH Ward 08  2014-09-03 17:12:00.000
1   RD8-GH Endo     2014-09-09 09:06:00.000
2   RD8-GH Ward 08  2014-09-17 17:00:00.000
3   RD8-GH Ward 08  2014-10-01 17:15:00.000

如果发布的其他答案不符合您的要求，我会继续查看。

参考：

OVER Clause

PARTITION BY Divides the query result set into partitions. The window function is applied to each partition separately and computation restarts for each partition.

ORDER BY clause Defines the logical order of the rows within each partition of the result set. That is, it specifies the logical order in which the window functioncalculation is performed.

这是一个潜在的解决方案，可能会因您使用的数据大小而出现性能问题，但您可以对其进行测试：

-- sets up your dummy data
CREATE TABLE #t_opnames
    (
      Opnamenummer INT ,
      Patient_ID INT ,
      afdelingscode NVARCHAR(20) ,
      Opnametype NVARCHAR(20) ,
      Specialismen NVARCHAR(20) ,
      OntslagDatumTijd DATETIME
    );

INSERT  INTO #t_opnames
        ( Opnamenummer, Patient_ID, afdelingscode, Opnametype, Specialismen,
          OntslagDatumTijd )
VALUES  ( 2983800, 100006, 'RD8-GH MAU', 'Inpatient-E', 'GM',
          '2014-09-01 14:50:00.000' ),
        ( 2983800, 100006, 'RD8-GH MAU', 'Inpatient-E', 'GM',
          '2014-09-02 19:32:00.000' ),
        ( 2983800, 100006, 'RD8-GH Ward 08', 'Inpatient-E', 'GM',
          '2014-09-03 17:12:00.000' ),
        ( 2983800, 100006, 'RD8-GH Endo', 'Inpatient-E', 'GM',
          '2014-09-09 09:06:00.000' ),
        ( 2983800, 100006, 'RD8-GH Ward 08', 'Inpatient-E', 'GM',
          '2014-09-17 17:00:00.000' ),
        ( 2983800, 100006, 'RD8-GH Ward 08', 'Inpatient-E', 'GM',
          '2014-10-01 17:15:00.000' )

-- I've added a row number to your data to enable iteration over the data
SELECT  ROW_NUMBER() OVER ( ORDER BY OntslagDatumTijd ) AS rn ,
        *
INTO #temp
FROM    #t_opnames
ORDER BY OntslagDatumTijd
-- this will iterate over the rows and apply the rankings
;WITH cte AS (
    SELECT *, 1 AS rnk 
    FROM #temp 
    WHERE rn = 1

    UNION ALL 

    SELECT t.*, CASE WHEN cte.afdelingscode = t.afdelingscode 
                     THEN cte.rnk + 1 
                     ELSE 1 
                END AS rnk 
    FROM #temp t
    INNER JOIN cte ON cte.rn +1 = t.rn
)
SELECT * FROM cte

DROP TABLE #t_opnames
DROP TABLE #temp

您将达到更大数据集的 MAXRECURSION 限制，为此您需要在最终 SELECT 之后使用以下内容修改限制：

SELECT * FROM cte
OPTION (MAXRECURSION 0)

将此值设置为 0 不会施加任何限制，如果您事先知道，可以将此数字设置为数据集的大小。

Answer 3

我终于得到了我的查询的解决方案，现在我得到了我想要的输出，并且也在 3 秒内运行超过 75k+ 行。我使用的代码是：

SELECT DISTINCT ROW_NUMBER () OVER (ORDER BY Patient_ID, Opnamenummer, SPECIALISMEN, Opnametype, OntslagDatumTijd) AS rownum, 
            * INTO #temp
FROM t_opnames
ORDER BY Patient_ID, Opnamenummer, SPECIALISMEN, Opnametype, OntslagDatumTijd;

WITH CTE
AS (SELECT *, 
           ROW_NUMBER () OVER (ORDER BY rownum) - ROW_NUMBER () OVER (PARTITION BY Patient_ID, 
                                                                                   Opnametype, 
                                                                                   afdelingscode ORDER BY rownum) AS RowGroup
      FROM #temp) 
SELECT ROW_NUMBER () OVER (PARTITION BY RowGroup, 
                                        Patient_ID, 
                                        Opnametype, 
                                        afdelingscode ORDER BY rownum) AS GroupSequence, 
       *
  FROM CTE
  ORDER BY rownum;

DROP TABLE #temp;

我参考了在此 page

上发布的示例

sql 仅对连续行进行排名

sql rank only continuous rows

sql-server

rows

continuous

rank

sql-server-2008