具有挑战性的 Firebird 递归 CTE 问题

Challenging Firebird Recursive CTE issue

这可能不是一个简单的 Firebird 问题,但我希望有一个我不知道的功能可以帮助我超越普通的香草 SQL。

我有两张桌子。第一个是 "critical parameters," 的名称列表,第二个涉及某些对象 ID、关键参数名称和关键参数值:

CREATE TABLE CRITICALPARAMS
(
PARAM Varchar(32) NOT NULL,
INDX INTEGER NOT NULL,
CONSTRAINT PK_CRITICALPARAMS_1 PRIMARY KEY (PARAM),
CONSTRAINT UNQ_CRITICALPARAMS_1 UNIQUE (INDX)
);

CREATE TABLE CRITICALPARAMVALS
(
ID INTEGER NOT NULL,
PARAM Varchar(32) NOT NULL,
VAL Float NOT NULL,
CONSTRAINT PK_CRITICALPARAMVALS_1 PRIMARY KEY (ID,PARAM)
);

假设我们有一个四维space:

insert into CRITICALPARAMS values ('a', 1);
insert into CRITICALPARAMS values ('b', 2);
insert into CRITICALPARAMS values ('c', 3);
insert into CRITICALPARAMS values ('foo', 4);

...以及其中的一些对象 space:

insert into CRITICALPARAMVALS values (1, 'a', 0.0);
insert into CRITICALPARAMVALS values (1, 'b', 0.0);
insert into CRITICALPARAMVALS values (1, 'c', 2.0);
insert into CRITICALPARAMVALS values (1, 'foo', 99.0);
insert into CRITICALPARAMVALS values (2, 'a', 0.0);
insert into CRITICALPARAMVALS values (2, 'b', 0.0);
insert into CRITICALPARAMVALS values (2, 'c', 2.0);
insert into CRITICALPARAMVALS values (2, 'foo', 99.0);
insert into CRITICALPARAMVALS values (3, 'a', 0.0);
insert into CRITICALPARAMVALS values (3, 'b', 0.0);
insert into CRITICALPARAMVALS values (3, 'c', 1.0);
insert into CRITICALPARAMVALS values (3, 'foo', 98.0);
insert into CRITICALPARAMVALS values (4, 'a', 0.0);
insert into CRITICALPARAMVALS values (4, 'b', 0.0);
insert into CRITICALPARAMVALS values (4, 'c', 1.0);
insert into CRITICALPARAMVALS values (4, 'foo', 98.0);
insert into CRITICALPARAMVALS values (5, 'a', 0.0);
insert into CRITICALPARAMVALS values (5, 'b', 0.0);
insert into CRITICALPARAMVALS values (5, 'c', 2.0);
insert into CRITICALPARAMVALS values (5, 'foo', 98.0);

问题是对关键参数 space 进行分区,将具有相同参数值的所有对象 ID 分组在一起。我们可以想到用一个"seed"对象ID,问还有哪些ID和种子对象属于同一个分区

在我们的示例中,对象 1 和 2 构成一个分区,对象 3 和 4 构成另一个分区,对象 5 构成第三个分区。所有五个对象在关键参数 a 和 b 上都相同,但在参数 c 和 foo 上不同。

有什么方法可以使用普通香草 SQL 来解决这个问题吗?递归 CTE 怎么样?

我已经粗略地解决了这个问题,在存储过程中使用 EXECUTE STATEMENT,遍历种子的关键参数值并手动构造一个大的 SQL 语句,其中包含与关键参数一样多的 WHERE 子句,但是当我达到大约 500-1000 个关键参数(或更多!)时,解决方案无法扩展。

我目前的尝试在以下几点逐渐消失 -- 我首先定义了一个视图,它可以给我一个沿着单个关键参数的分区(TEST_FLOAT_EQ 是一个 selectable 存储过程,它比较两个浮点数是否 'good enough!' 相等):

CREATE VIEW VGROUPIDBYPARAM (SEEDID, GROUPMEMBERID, CRITPARAMINDX)
AS 
select a.id as seedid, b.id as groupmemberid, c.INDX as critparamindx
from CRITICALPARAMVALS a
join CRITICALPARAMVALS b on a.PARAM=b.param and (exists (select isequal from TEST_FLOAT_EQ(a.val, b.val, 1e-5) where ISEQUAL=1))
join CRITICALPARAMS c on b.param=c.PARAM;

...然后我想以归纳方式使用 VGROUPIDBYPARAM 视图,类似于以下部分完成的 select:

SELECT a1.SEEDID, a6.GROUPMEMBERID
FROM VGROUPIDBYPARAM a1
join VGROUPIDBYPARAM a2 on a1.SEEDID=a2.SEEDID and a1.GROUPMEMBERID=a2.GROUPMEMBERID
join VGROUPIDBYPARAM a3 on a1.SEEDID=a3.SEEDID and a2.GROUPMEMBERID=a3.GROUPMEMBERID
join VGROUPIDBYPARAM a4 on a1.SEEDID=a4.SEEDID and a3.GROUPMEMBERID=a4.GROUPMEMBERID
join VGROUPIDBYPARAM a5 on a1.SEEDID=a5.SEEDID and a4.GROUPMEMBERID=a5.GROUPMEMBERID
join VGROUPIDBYPARAM a6 on a1.SEEDID=a6.SEEDID and a5.GROUPMEMBERID=a6.GROUPMEMBERID
...
where a1.CRITPARAMINDX=1
and a2.CRITPARAMINDX=2
and a3.CRITPARAMINDX=3
and a4.CRITPARAMINDX=4
and a5.CRITPARAMINDX=5
and a6.CRITPARAMINDX=6
...

在此归纳过程结束时(我希望递归 CTE 可以模仿),通过 JOINS 堆的唯一幸存记录具有与种子 ID 属于同一分区的组成员 ID .

非常感谢任何能帮我有效解决这个问题的人!

为了解决这个问题,我将从这个简单的查询开始(计算其他对象中的匹配维度):

SELECT
    CPV1.ID AS ID1,
    CPV2.ID AS ID2,
    COUNT(*)
FROM
    CRITICALPARAMVALS CPV1
    INNER JOIN CRITICALPARAMVALS CPV2 ON CPV1.ID <> CPV2.ID
          AND CPV1.PARAM = CPV2.PARAM
          AND CPV1.VAL = CPV2.VAL
GROUP BY
    CPV1.ID, CPV2.ID

输出如下:

如您所见,有趣的行标有黄色背景。

要仅过滤那些行,我们应该添加此条件:

HAVING COUNT(*) = (SELECT COUNT(*) FROM CRITICALPARAMS)

We can think of using a "seed" object ID, and asking what other IDs belong to the same partition as the seed object.

回答上述问题的最终查询,带有 :SEED 参数,如下所示:

SELECT
    CPV2.ID
FROM
    CRITICALPARAMVALS CPV1
    INNER JOIN CRITICALPARAMVALS CPV2 ON CPV1.ID <> CPV2.ID
          AND CPV1.PARAM = CPV2.PARAM
          AND CPV1.VAL = CPV2.VAL
WHERE CPV1.ID = :SEED
GROUP BY
    CPV1.ID, CPV2.ID  
HAVING COUNT(*) = (SELECT COUNT(*) FROM CRITICALPARAMS)

即使对于大数据集,它也应该表现良好。