如何密集排列数据集
How to Dense Rank Sets of data
我正在尝试获得密集排名以将数据集分组在一起。在我的 table 中,我有 ID、GRP_SET、SUB_SET 和仅表示日期字段的 INTERVAL。当使用 ID 插入记录时,它们被插入为显示为 SUB_SET 的 3 行的 GRP_SETs。如您所见,插入发生时,间隔可能会在完成插入集合之前略有变化。
这是一些示例数据,DRANK 列代表我想要获得的排名。
with q as (
select 1 id, 'a' GRP_SET, 1 as SUB_SET, 123 as interval, 1 as DRANK from dual union all
select 1, 'a', 2, 123, 1 from dual union all
select 1, 'a', 3, 124, 1 from dual union all
select 1, 'b', 1, 234, 2 from dual union all
select 1, 'b', 2, 235, 2 from dual union all
select 1, 'b', 3, 235, 2 from dual union all
select 1, 'a', 1, 331, 3 from dual union all
select 1, 'a', 2, 331, 3 from dual union all
select 1, 'a', 3, 331, 3 from dual)
select * from q
示例数据
ID GRP_SET SUBSET INTERVAL DRANK
1 a 1 123 1
1 a 2 123 1
1 a 3 124 1
1 b 1 234 2
1 b 3 235 2
1 b 2 235 2
1 a 1 331 3
1 a 2 331 3
1 a 3 331 3
这是我已经接近的查询,但我似乎需要这样的东西:
- 分区依据: ID
- 分区内排序依据: ID、间隔
- 更改排名时: ID,GRP_SET(更改)
select
id, GRP_SET, SUB_SET, interval,
DENSE_RANK() over (partition by ID order by id, GRP_SET) as DRANK_TEST
from q
Order by
id, interval
这可能对你有用。复杂的因素是您想要相同的 "DENSE RANK" 间隔 123
和 124
以及间隔 234
和 235
。因此,为了对 DENSE_RANK()
函数进行排序,我们会将它们截断为最接近的 10:
SELECT id, grp_set, sub_set, interval, drank
, DENSE_RANK() OVER ( PARTITION BY id ORDER BY TRUNC(interval, -1), grp_set ) AS drank_test
FROM q
Please see SQL Fiddle demo here.
如果您希望间隔更近以便分组在一起,则可以在截断之前乘以该值。这会将它们按 3 进行分组(但也许您不需要如此精细):
SELECT id, grp_set, sub_set, interval, drank
, DENSE_RANK() OVER ( PARTITION BY id ORDER BY TRUNC(interval*10/3, -1), grp_set ) AS drank_test
FROM q
使用 MODEL
子句
请注意,您的要求超出了 "ordinary" SQL 中易于表达的限制。但幸运的是,您使用的是 Oracle,它具有 MODEL
子句,这种设备的神秘之处仅在于其强大的功能 (excellent whitepaper here)。你应该写:
SELECT
id, grp_set, sub_set, interval, drank
FROM (
SELECT id, grp_set, sub_set, interval, 1 drank
FROM q
)
MODEL PARTITION BY (id)
DIMENSION BY (row_number() OVER (ORDER BY interval, sub_set) rn)
MEASURES (grp_set, sub_set, interval, drank)
RULES (
drank[any] = NVL(drank[cv(rn) - 1] +
DECODE(grp_set[cv(rn) - 1], grp_set[cv(rn)], 0, 1), 1)
)
解释:
SELECT
id, grp_set, sub_set, interval, drank
FROM (
-- Here, we initialise your "dense rank" to 1
SELECT id, grp_set, sub_set, interval, 1 drank
FROM q
)
-- Then we partition the data set by ID (that's your requirement)
MODEL PARTITION BY (id)
-- We generate row numbers for all columns ordered by interval and sub_set,
-- such that we can then access row numbers in that particular order
DIMENSION BY (row_number() OVER (ORDER BY interval, sub_set) rn)
-- These are the columns that we want to generate from the MODEL clause
MEASURES (grp_set, sub_set, interval, drank)
-- And the rules are simple: Each "dense rank" value is equal to the
-- previous "dense rank" value + 1, if the grp_set value has changed
RULES (
drank[any] = NVL(drank[cv(rn) - 1] +
DECODE(grp_set[cv(rn) - 1], grp_set[cv(rn)], 0, 1), 1)
)
当然,这只有在没有交错事件的情况下才有效,即在 123 和 124
之间除了 a
之外没有其他 grp_set
我正在尝试获得密集排名以将数据集分组在一起。在我的 table 中,我有 ID、GRP_SET、SUB_SET 和仅表示日期字段的 INTERVAL。当使用 ID 插入记录时,它们被插入为显示为 SUB_SET 的 3 行的 GRP_SETs。如您所见,插入发生时,间隔可能会在完成插入集合之前略有变化。
这是一些示例数据,DRANK 列代表我想要获得的排名。
with q as (
select 1 id, 'a' GRP_SET, 1 as SUB_SET, 123 as interval, 1 as DRANK from dual union all
select 1, 'a', 2, 123, 1 from dual union all
select 1, 'a', 3, 124, 1 from dual union all
select 1, 'b', 1, 234, 2 from dual union all
select 1, 'b', 2, 235, 2 from dual union all
select 1, 'b', 3, 235, 2 from dual union all
select 1, 'a', 1, 331, 3 from dual union all
select 1, 'a', 2, 331, 3 from dual union all
select 1, 'a', 3, 331, 3 from dual)
select * from q
示例数据
ID GRP_SET SUBSET INTERVAL DRANK
1 a 1 123 1
1 a 2 123 1
1 a 3 124 1
1 b 1 234 2
1 b 3 235 2
1 b 2 235 2
1 a 1 331 3
1 a 2 331 3
1 a 3 331 3
这是我已经接近的查询,但我似乎需要这样的东西:
- 分区依据: ID
- 分区内排序依据: ID、间隔
- 更改排名时: ID,GRP_SET(更改)
select
id, GRP_SET, SUB_SET, interval,
DENSE_RANK() over (partition by ID order by id, GRP_SET) as DRANK_TEST
from q
Order by
id, interval
这可能对你有用。复杂的因素是您想要相同的 "DENSE RANK" 间隔 123
和 124
以及间隔 234
和 235
。因此,为了对 DENSE_RANK()
函数进行排序,我们会将它们截断为最接近的 10:
SELECT id, grp_set, sub_set, interval, drank
, DENSE_RANK() OVER ( PARTITION BY id ORDER BY TRUNC(interval, -1), grp_set ) AS drank_test
FROM q
Please see SQL Fiddle demo here.
如果您希望间隔更近以便分组在一起,则可以在截断之前乘以该值。这会将它们按 3 进行分组(但也许您不需要如此精细):
SELECT id, grp_set, sub_set, interval, drank
, DENSE_RANK() OVER ( PARTITION BY id ORDER BY TRUNC(interval*10/3, -1), grp_set ) AS drank_test
FROM q
使用 MODEL
子句
请注意,您的要求超出了 "ordinary" SQL 中易于表达的限制。但幸运的是,您使用的是 Oracle,它具有 MODEL
子句,这种设备的神秘之处仅在于其强大的功能 (excellent whitepaper here)。你应该写:
SELECT
id, grp_set, sub_set, interval, drank
FROM (
SELECT id, grp_set, sub_set, interval, 1 drank
FROM q
)
MODEL PARTITION BY (id)
DIMENSION BY (row_number() OVER (ORDER BY interval, sub_set) rn)
MEASURES (grp_set, sub_set, interval, drank)
RULES (
drank[any] = NVL(drank[cv(rn) - 1] +
DECODE(grp_set[cv(rn) - 1], grp_set[cv(rn)], 0, 1), 1)
)
解释:
SELECT
id, grp_set, sub_set, interval, drank
FROM (
-- Here, we initialise your "dense rank" to 1
SELECT id, grp_set, sub_set, interval, 1 drank
FROM q
)
-- Then we partition the data set by ID (that's your requirement)
MODEL PARTITION BY (id)
-- We generate row numbers for all columns ordered by interval and sub_set,
-- such that we can then access row numbers in that particular order
DIMENSION BY (row_number() OVER (ORDER BY interval, sub_set) rn)
-- These are the columns that we want to generate from the MODEL clause
MEASURES (grp_set, sub_set, interval, drank)
-- And the rules are simple: Each "dense rank" value is equal to the
-- previous "dense rank" value + 1, if the grp_set value has changed
RULES (
drank[any] = NVL(drank[cv(rn) - 1] +
DECODE(grp_set[cv(rn) - 1], grp_set[cv(rn)], 0, 1), 1)
)
当然,这只有在没有交错事件的情况下才有效,即在 123 和 124
之间除了a
之外没有其他 grp_set