我如何为自引用 SELECT 编写 SQL,其中某些字段不喜欢同一字段和其他一些条件?

How can I write SQL for a self-referencing SELECT where some field NOT LIKE same field and a few other criteria?

我有以下(简体)table:

CREATE TABLE IF NOT EXISTS `resource` (
  `id`         INTEGER NOT NULL PRIMARY KEY AUTOINCREMENT,
  `host`       TEXT NOT NULL,
  `inspecting` INTEGER DEFAULT 0,
  `visitedAt`  TEXT
);

可以有多个记录具有相同的 host 值 and/or,该值被视为 host 的子域。例如:

 id |      host       | inspecting |      visitedAt
---------------------------------------------------------
  1 |     example.com |          0 |                null
  2 |     example.com |          0 | 2020-09-28 00:00:00
  3 | sub.example.com |          1 |                null
  4 |     example.org |          0 |                null

因此,这些主机可能已经被访问过或当前可以被检查。我想找到最近未访问过且当前未被检查且其主机也不被视为最近访问过或当前检查过的主机的子域的最旧主机。

所以,假设 example.com 当前正在检查或最近访问过,那么我不想匹配 example.comsub.example.com。在上面的数据样本中 example.org 应该匹配。

我已经用 JOINs 和 WHERE (NOT) EXISTS 尝试了各种查询,但就是无法正常工作。

最接近的查询类似于(虽然可能不准确):

SELECT `self`.*
FROM `resource` AS `self`
WHERE 
  `self`.`inspecting` != 1 AND 
  (`self`.`visitedAt` IS NULL OR datetime( `self`.`visitedAt` ) <= datetime( 'now', '-10 minutes' )) AND
  NOT EXISTS (  
    SELECT 1
    FROM 
      `resource` AS `probe`
    WHERE
      `probe`.`inspecting` = 1 AND 
      (`self`.`host` = `probe`.`host` OR `self`.`host` LIKE "%." || `probe`.`host`) AND 
      (`probe`.`visitedAt` IS NOT NULL AND datetime( `probe`.`visitedAt` ) > datetime( 'now', '-10 minutes' ))
  )
ORDER BY `self`.`visitedAt` ASC
LIMIT 1

是否可以通过单个查询过滤掉这样的行?

您可以在 select 中使用带有 sub-queries 的通用 table 表达式来创建条件列,如下所示:

WITH `t` AS (
  SELECT  
    -- Create a conditional column `inspectionFlag`
    CASE WHEN
    (
    -- Checks the `resource` table for hosts that has inspecting = 1
     SELECT 1 FROM `resource` AS `probe` 
      WHERE (`probe`.`host` LIKE '%.' || `self`.`host` OR `self`.`host` LIKE '%.' || `probe`.`host` OR `self`.`host` = `probe`.`host`)
      AND `probe`.`inspecting` = 1
    ) IS NOT NULL THEN 1 ELSE 0 END
    AS `inspectionFlag`,
    CASE 
    WHEN     
    -- Checks the `resource` table for hosts that are being visited
    (SELECT 1 FROM `resource` AS `probe`
            WHERE (`probe`.`host` LIKE '%.' || `self`.`host` OR `self`.`host` LIKE '%.' || `probe`.`host` OR `self`.`host` = `probe`.`host`)
      AND `probe`.`visitedAt` IS NOT NULL
      AND datetime( `probe`.`visitedAt` ) > datetime( 'now', '-10 minutes' )
    ) IS NOT NULL THEN 1 ELSE 0 END 
    AS `visitedFlag`
  , `self`.*
  FROM `resource` AS `self`
) 
SELECT `t`.* FROM `t`

结果现在应该如下所示:

| inspectionFlag | visitedFlag | id  | host            | inspecting | visitedAt           |
| -------------- | ----------- | --- | --------------- | ---------- | ------------------- |
| 1              | 1           | 1   | example.com     | 0          |                     |
| 1              | 1           | 2   | example.com     | 0          | 2020-09-28 08:00:00 |
| 1              | 1           | 3   | sub.example.com | 1          |                     |
| 0              | 0           | 4   | example.org     | 0          |                     |

现在只需使用新的“标志”列来过滤主机,就像这样:

WITH `t` AS (
  SELECT  
    CASE WHEN
    (
     SELECT 1 FROM `resource` AS `probe` 
      WHERE (`probe`.`host` LIKE '%.' || `self`.`host` OR `self`.`host` LIKE '%.' || `probe`.`host` OR `self`.`host` = `probe`.`host`)
      AND `probe`.`inspecting` = 1
    ) IS NOT NULL THEN 1 ELSE 0 END
    AS `inspectionFlag`,
    CASE 
    WHEN     
    (SELECT 1 FROM `resource` AS `probe`
            WHERE (`probe`.`host` LIKE '%.' || `self`.`host` OR `self`.`host` LIKE '%.' || `probe`.`host` OR `self`.`host` = `probe`.`host`)
      AND `probe`.`visitedAt` IS NOT NULL
      AND datetime( `probe`.`visitedAt` ) > datetime( 'now', '-10 minutes' )
    ) IS NOT NULL THEN 1 ELSE 0 END 
    AS `visitedFlag`
  , `self`.*
  FROM `resource` AS `self`
) 
SELECT `t`.* FROM `t`
WHERE `t`.`inspectionFlag` = 0 AND `t`.`visitedFlag` = 0

结果:

| inspectionFlag | visitedFlag | id  | host        | inspecting | visitedAt |
| -------------- | ----------- | --- | ----------- | ---------- | --------- |
| 0              | 0           | 4   | example.org | 0          |           |