排序依据的密集排名

Dense Rank with order by

我有这样的作业Table

EMPLID | RCD | COMPANY |   EFFDT       |  SALARY
---------------------------------------------------
100    | 0   | xyz     |   1/1/2000    |    1000
100    | 0   | xyz     |   1/15/2000   |    1100
100    | 0   | xyz     |   1/31/2000   |    1200
100    | 0   | ggg     |   2/15/2000   |    1500
100    | 1   | abc     |   3/1/2000    |    2000
100    | 1   | abc     |   4/1/2000    |    2100

我需要一个计数器,只要 RCD 或公司组合发生变化,它就会增加,并且应该按 effdt 排序。

EMPLID | RCD | COMPANY |   EFFDT       |  SALARY     | COUNTER
-------|-----|---------|---------------|-------------|----------
100    | 0   | xyz     |   1/1/2000    |    1000     | 1
100    | 0   | xyz     |   1/15/2000   |    1100     | 1
100    | 0   | xyz     |   1/31/2000   |    1200     | 1
100    | 0   | ggg     |   2/15/2000   |    1500     | 2
100    | 1   | abc     |   3/1/2000    |    2000     | 3
100    | 1   | abc     |   4/1/2000    |    2100     | 3

我尝试了 Dense_Rank 按 EMPLID 、 RCD 、 COMPANY 排序的函数,它为我提供了计数器,但它不是按 effdt 排序的。

SELECT EMPLID,RCD,COMPANY,EFFDT,
    DENSE_RANK() over (order by EMPLID , RCD , COMPANY) AS COUNTER
FROM ASSIGNMENT ;

按 EFFDT 排序,给出增量计数器 1 ... 6

SELECT EMPLID,RCD,COMPANY,EFFDT,
  DENSE_RANK() over (order by EFFDT) AS COUNTER 
FROM ASSIGNMENT;

请帮助我找出我所缺少的东西。

试试 LAG

WITH flagged AS (  
    SELECT *, 
      CASE WHEN LAG(RCD) OVER(PARTITION BY EMPLID ORDER BY EFFDT) = RCD 
              AND LAG(COMPANY) OVER(PARTITION BY EMPLID ORDER BY EFFDT) = COMPANY THEN 0 ELSE 1 END strtFlag
    FROM tbl
    )

SELECT EMPLID, RCD, COMPANY, EFFDT, SALARY, SUM(strtFlag) OVER(PARTITION BY EMPLID ORDER BY EFFDT) COUNTER
FROM flagged

或者,使用组

的DENSE_RANK()
WITH grps AS (  
    SELECT *, 
      ROW_NUMBER() OVER(PARTITION BY EMPLID ORDER BY EFFDT) -
      ROW_NUMBER() OVER(PARTITION BY EMPLID, RCD, COMPANY ORDER BY EFFDT) grp
    FROM tbl
    )

SELECT EMPLID, RCD, COMPANY, EFFDT, SALARY
  , DENSE_RANK() OVER(PARTITION BY EMPLID ORDER BY grp) COUNTER
FROM grps

无论如何看起来需要两个步骤才能获得密集编号。

我想你正在寻找:

SELECT EMPLID,RCD,COMPANY,EFFDT,
    DENSE_RANK() over (order by EMPLID , RCD , COMPANY) AS COUNTER
FROM (select * from ASSIGNMENT order by EFFDT);

SELECT EMPLID,RCD,COMPANY,EFFDT,
    DENSE_RANK() over (order by EMPLID , RCD , COMPANY) AS COUNTER
FROM (select * from ASSIGNMENT order by EMPLID , RCD , COMPANY, EFFDT);

这应该有效 - 澄清 rcd 和 company 的组合应该保持相同 "counter" 即使它出现在非连续的时期。我向测试数据添加了更多行以确保得到正确的结果。

与 Serg 的解决方案(回答不同的问题)一样,该解决方案首先传递基础数据,然后第二次传递第一次传递的结果(全部在内存中,因此应该相对较快) .没有办法解决这个问题——这需要两个不同的分析函数,其中一个依赖于另一个的结果,并且不允许嵌套的分析函数。 (答案的这一部分解决了 OP 对 Serg 的回答的评论。)

with
     test_data ( emplid, rcd, company, effdt, salary ) as (
       select 100, 0, 'xyz', to_date('1/1/2000' , 'mm/dd/yyyy'), 1000 from dual union all
       select 100, 0, 'xyz', to_date('1/15/2000', 'mm/dd/yyyy'), 1100 from dual union all
       select 100, 0, 'xyz', to_date('1/31/2000', 'mm/dd/yyyy'), 1200 from dual union all
       select 100, 0, 'ggg', to_date('2/15/2000', 'mm/dd/yyyy'), 1500 from dual union all
       select 100, 1, 'abc', to_date('3/1/2000' , 'mm/dd/yyyy'), 2000 from dual union all
       select 100, 1, 'abc', to_date('4/1/2000' , 'mm/dd/yyyy'), 2100 from dual union all
       select 100, 0, 'xyz', to_date('5/1/2000' , 'mm/dd/yyyy'), 2200 from dual union all
       select 100, 1, 'ggg', to_date('8/15/2000', 'mm/dd/yyyy'), 2300 from dual
     )
-- end of test data; the actual solution (SQL query) begins below this line
select emplid, rcd, company, effdt, salary,
       dense_rank() over (partition by emplid order by min_dt) as counter
from ( select emplid, rcd, company, effdt, salary, 
              min(effdt) over (partition by emplid, rcd, company) as min_dt
       from   test_data )
order by effdt                --   ORDER BY is optional
;

    EMPLID        RCD COM EFFDT                   SALARY    COUNTER
---------- ---------- --- ------------------- ---------- ----------
       100          0 xyz 2000-01-01 00:00:00       1000          1
       100          0 xyz 2000-01-15 00:00:00       1100          1
       100          0 xyz 2000-01-31 00:00:00       1200          1
       100          0 ggg 2000-02-15 00:00:00       1500          2
       100          1 abc 2000-03-01 00:00:00       2000          3
       100          1 abc 2000-04-01 00:00:00       2100          3
       100          0 xyz 2000-05-01 00:00:00       2200          1
       100          1 ggg 2000-08-15 00:00:00       2300          4

 8 rows selected