具有挑战性的 Firebird 递归 CTE 问题
Challenging Firebird Recursive CTE issue
这可能不是一个简单的 Firebird 问题,但我希望有一个我不知道的功能可以帮助我超越普通的香草 SQL。
我有两张桌子。第一个是 "critical parameters," 的名称列表,第二个涉及某些对象 ID、关键参数名称和关键参数值:
CREATE TABLE CRITICALPARAMS
(
PARAM Varchar(32) NOT NULL,
INDX INTEGER NOT NULL,
CONSTRAINT PK_CRITICALPARAMS_1 PRIMARY KEY (PARAM),
CONSTRAINT UNQ_CRITICALPARAMS_1 UNIQUE (INDX)
);
CREATE TABLE CRITICALPARAMVALS
(
ID INTEGER NOT NULL,
PARAM Varchar(32) NOT NULL,
VAL Float NOT NULL,
CONSTRAINT PK_CRITICALPARAMVALS_1 PRIMARY KEY (ID,PARAM)
);
假设我们有一个四维space:
insert into CRITICALPARAMS values ('a', 1);
insert into CRITICALPARAMS values ('b', 2);
insert into CRITICALPARAMS values ('c', 3);
insert into CRITICALPARAMS values ('foo', 4);
...以及其中的一些对象 space:
insert into CRITICALPARAMVALS values (1, 'a', 0.0);
insert into CRITICALPARAMVALS values (1, 'b', 0.0);
insert into CRITICALPARAMVALS values (1, 'c', 2.0);
insert into CRITICALPARAMVALS values (1, 'foo', 99.0);
insert into CRITICALPARAMVALS values (2, 'a', 0.0);
insert into CRITICALPARAMVALS values (2, 'b', 0.0);
insert into CRITICALPARAMVALS values (2, 'c', 2.0);
insert into CRITICALPARAMVALS values (2, 'foo', 99.0);
insert into CRITICALPARAMVALS values (3, 'a', 0.0);
insert into CRITICALPARAMVALS values (3, 'b', 0.0);
insert into CRITICALPARAMVALS values (3, 'c', 1.0);
insert into CRITICALPARAMVALS values (3, 'foo', 98.0);
insert into CRITICALPARAMVALS values (4, 'a', 0.0);
insert into CRITICALPARAMVALS values (4, 'b', 0.0);
insert into CRITICALPARAMVALS values (4, 'c', 1.0);
insert into CRITICALPARAMVALS values (4, 'foo', 98.0);
insert into CRITICALPARAMVALS values (5, 'a', 0.0);
insert into CRITICALPARAMVALS values (5, 'b', 0.0);
insert into CRITICALPARAMVALS values (5, 'c', 2.0);
insert into CRITICALPARAMVALS values (5, 'foo', 98.0);
问题是对关键参数 space 进行分区,将具有相同参数值的所有对象 ID 分组在一起。我们可以想到用一个"seed"对象ID,问还有哪些ID和种子对象属于同一个分区
在我们的示例中,对象 1 和 2 构成一个分区,对象 3 和 4 构成另一个分区,对象 5 构成第三个分区。所有五个对象在关键参数 a 和 b 上都相同,但在参数 c 和 foo 上不同。
有什么方法可以使用普通香草 SQL 来解决这个问题吗?递归 CTE 怎么样?
我已经粗略地解决了这个问题,在存储过程中使用 EXECUTE STATEMENT,遍历种子的关键参数值并手动构造一个大的 SQL 语句,其中包含与关键参数一样多的 WHERE 子句,但是当我达到大约 500-1000 个关键参数(或更多!)时,解决方案无法扩展。
我目前的尝试在以下几点逐渐消失 -- 我首先定义了一个视图,它可以给我一个沿着单个关键参数的分区(TEST_FLOAT_EQ 是一个 selectable 存储过程,它比较两个浮点数是否 'good enough!' 相等):
CREATE VIEW VGROUPIDBYPARAM (SEEDID, GROUPMEMBERID, CRITPARAMINDX)
AS
select a.id as seedid, b.id as groupmemberid, c.INDX as critparamindx
from CRITICALPARAMVALS a
join CRITICALPARAMVALS b on a.PARAM=b.param and (exists (select isequal from TEST_FLOAT_EQ(a.val, b.val, 1e-5) where ISEQUAL=1))
join CRITICALPARAMS c on b.param=c.PARAM;
...然后我想以归纳方式使用 VGROUPIDBYPARAM 视图,类似于以下部分完成的 select:
SELECT a1.SEEDID, a6.GROUPMEMBERID
FROM VGROUPIDBYPARAM a1
join VGROUPIDBYPARAM a2 on a1.SEEDID=a2.SEEDID and a1.GROUPMEMBERID=a2.GROUPMEMBERID
join VGROUPIDBYPARAM a3 on a1.SEEDID=a3.SEEDID and a2.GROUPMEMBERID=a3.GROUPMEMBERID
join VGROUPIDBYPARAM a4 on a1.SEEDID=a4.SEEDID and a3.GROUPMEMBERID=a4.GROUPMEMBERID
join VGROUPIDBYPARAM a5 on a1.SEEDID=a5.SEEDID and a4.GROUPMEMBERID=a5.GROUPMEMBERID
join VGROUPIDBYPARAM a6 on a1.SEEDID=a6.SEEDID and a5.GROUPMEMBERID=a6.GROUPMEMBERID
...
where a1.CRITPARAMINDX=1
and a2.CRITPARAMINDX=2
and a3.CRITPARAMINDX=3
and a4.CRITPARAMINDX=4
and a5.CRITPARAMINDX=5
and a6.CRITPARAMINDX=6
...
在此归纳过程结束时(我希望递归 CTE 可以模仿),通过 JOINS 堆的唯一幸存记录具有与种子 ID 属于同一分区的组成员 ID .
非常感谢任何能帮我有效解决这个问题的人!
为了解决这个问题,我将从这个简单的查询开始(计算其他对象中的匹配维度):
SELECT
CPV1.ID AS ID1,
CPV2.ID AS ID2,
COUNT(*)
FROM
CRITICALPARAMVALS CPV1
INNER JOIN CRITICALPARAMVALS CPV2 ON CPV1.ID <> CPV2.ID
AND CPV1.PARAM = CPV2.PARAM
AND CPV1.VAL = CPV2.VAL
GROUP BY
CPV1.ID, CPV2.ID
输出如下:
如您所见,有趣的行标有黄色背景。
要仅过滤那些行,我们应该添加此条件:
HAVING COUNT(*) = (SELECT COUNT(*) FROM CRITICALPARAMS)
We can think of using a "seed" object ID, and asking what other IDs
belong to the same partition as the seed object.
回答上述问题的最终查询,带有 :SEED
参数,如下所示:
SELECT
CPV2.ID
FROM
CRITICALPARAMVALS CPV1
INNER JOIN CRITICALPARAMVALS CPV2 ON CPV1.ID <> CPV2.ID
AND CPV1.PARAM = CPV2.PARAM
AND CPV1.VAL = CPV2.VAL
WHERE CPV1.ID = :SEED
GROUP BY
CPV1.ID, CPV2.ID
HAVING COUNT(*) = (SELECT COUNT(*) FROM CRITICALPARAMS)
即使对于大数据集,它也应该表现良好。
这可能不是一个简单的 Firebird 问题,但我希望有一个我不知道的功能可以帮助我超越普通的香草 SQL。
我有两张桌子。第一个是 "critical parameters," 的名称列表,第二个涉及某些对象 ID、关键参数名称和关键参数值:
CREATE TABLE CRITICALPARAMS
(
PARAM Varchar(32) NOT NULL,
INDX INTEGER NOT NULL,
CONSTRAINT PK_CRITICALPARAMS_1 PRIMARY KEY (PARAM),
CONSTRAINT UNQ_CRITICALPARAMS_1 UNIQUE (INDX)
);
CREATE TABLE CRITICALPARAMVALS
(
ID INTEGER NOT NULL,
PARAM Varchar(32) NOT NULL,
VAL Float NOT NULL,
CONSTRAINT PK_CRITICALPARAMVALS_1 PRIMARY KEY (ID,PARAM)
);
假设我们有一个四维space:
insert into CRITICALPARAMS values ('a', 1);
insert into CRITICALPARAMS values ('b', 2);
insert into CRITICALPARAMS values ('c', 3);
insert into CRITICALPARAMS values ('foo', 4);
...以及其中的一些对象 space:
insert into CRITICALPARAMVALS values (1, 'a', 0.0);
insert into CRITICALPARAMVALS values (1, 'b', 0.0);
insert into CRITICALPARAMVALS values (1, 'c', 2.0);
insert into CRITICALPARAMVALS values (1, 'foo', 99.0);
insert into CRITICALPARAMVALS values (2, 'a', 0.0);
insert into CRITICALPARAMVALS values (2, 'b', 0.0);
insert into CRITICALPARAMVALS values (2, 'c', 2.0);
insert into CRITICALPARAMVALS values (2, 'foo', 99.0);
insert into CRITICALPARAMVALS values (3, 'a', 0.0);
insert into CRITICALPARAMVALS values (3, 'b', 0.0);
insert into CRITICALPARAMVALS values (3, 'c', 1.0);
insert into CRITICALPARAMVALS values (3, 'foo', 98.0);
insert into CRITICALPARAMVALS values (4, 'a', 0.0);
insert into CRITICALPARAMVALS values (4, 'b', 0.0);
insert into CRITICALPARAMVALS values (4, 'c', 1.0);
insert into CRITICALPARAMVALS values (4, 'foo', 98.0);
insert into CRITICALPARAMVALS values (5, 'a', 0.0);
insert into CRITICALPARAMVALS values (5, 'b', 0.0);
insert into CRITICALPARAMVALS values (5, 'c', 2.0);
insert into CRITICALPARAMVALS values (5, 'foo', 98.0);
问题是对关键参数 space 进行分区,将具有相同参数值的所有对象 ID 分组在一起。我们可以想到用一个"seed"对象ID,问还有哪些ID和种子对象属于同一个分区
在我们的示例中,对象 1 和 2 构成一个分区,对象 3 和 4 构成另一个分区,对象 5 构成第三个分区。所有五个对象在关键参数 a 和 b 上都相同,但在参数 c 和 foo 上不同。
有什么方法可以使用普通香草 SQL 来解决这个问题吗?递归 CTE 怎么样?
我已经粗略地解决了这个问题,在存储过程中使用 EXECUTE STATEMENT,遍历种子的关键参数值并手动构造一个大的 SQL 语句,其中包含与关键参数一样多的 WHERE 子句,但是当我达到大约 500-1000 个关键参数(或更多!)时,解决方案无法扩展。
我目前的尝试在以下几点逐渐消失 -- 我首先定义了一个视图,它可以给我一个沿着单个关键参数的分区(TEST_FLOAT_EQ 是一个 selectable 存储过程,它比较两个浮点数是否 'good enough!' 相等):
CREATE VIEW VGROUPIDBYPARAM (SEEDID, GROUPMEMBERID, CRITPARAMINDX)
AS
select a.id as seedid, b.id as groupmemberid, c.INDX as critparamindx
from CRITICALPARAMVALS a
join CRITICALPARAMVALS b on a.PARAM=b.param and (exists (select isequal from TEST_FLOAT_EQ(a.val, b.val, 1e-5) where ISEQUAL=1))
join CRITICALPARAMS c on b.param=c.PARAM;
...然后我想以归纳方式使用 VGROUPIDBYPARAM 视图,类似于以下部分完成的 select:
SELECT a1.SEEDID, a6.GROUPMEMBERID
FROM VGROUPIDBYPARAM a1
join VGROUPIDBYPARAM a2 on a1.SEEDID=a2.SEEDID and a1.GROUPMEMBERID=a2.GROUPMEMBERID
join VGROUPIDBYPARAM a3 on a1.SEEDID=a3.SEEDID and a2.GROUPMEMBERID=a3.GROUPMEMBERID
join VGROUPIDBYPARAM a4 on a1.SEEDID=a4.SEEDID and a3.GROUPMEMBERID=a4.GROUPMEMBERID
join VGROUPIDBYPARAM a5 on a1.SEEDID=a5.SEEDID and a4.GROUPMEMBERID=a5.GROUPMEMBERID
join VGROUPIDBYPARAM a6 on a1.SEEDID=a6.SEEDID and a5.GROUPMEMBERID=a6.GROUPMEMBERID
...
where a1.CRITPARAMINDX=1
and a2.CRITPARAMINDX=2
and a3.CRITPARAMINDX=3
and a4.CRITPARAMINDX=4
and a5.CRITPARAMINDX=5
and a6.CRITPARAMINDX=6
...
在此归纳过程结束时(我希望递归 CTE 可以模仿),通过 JOINS 堆的唯一幸存记录具有与种子 ID 属于同一分区的组成员 ID .
非常感谢任何能帮我有效解决这个问题的人!
为了解决这个问题,我将从这个简单的查询开始(计算其他对象中的匹配维度):
SELECT
CPV1.ID AS ID1,
CPV2.ID AS ID2,
COUNT(*)
FROM
CRITICALPARAMVALS CPV1
INNER JOIN CRITICALPARAMVALS CPV2 ON CPV1.ID <> CPV2.ID
AND CPV1.PARAM = CPV2.PARAM
AND CPV1.VAL = CPV2.VAL
GROUP BY
CPV1.ID, CPV2.ID
输出如下:
如您所见,有趣的行标有黄色背景。
要仅过滤那些行,我们应该添加此条件:
HAVING COUNT(*) = (SELECT COUNT(*) FROM CRITICALPARAMS)
We can think of using a "seed" object ID, and asking what other IDs belong to the same partition as the seed object.
回答上述问题的最终查询,带有 :SEED
参数,如下所示:
SELECT
CPV2.ID
FROM
CRITICALPARAMVALS CPV1
INNER JOIN CRITICALPARAMVALS CPV2 ON CPV1.ID <> CPV2.ID
AND CPV1.PARAM = CPV2.PARAM
AND CPV1.VAL = CPV2.VAL
WHERE CPV1.ID = :SEED
GROUP BY
CPV1.ID, CPV2.ID
HAVING COUNT(*) = (SELECT COUNT(*) FROM CRITICALPARAMS)
即使对于大数据集,它也应该表现良好。