使用 MySQL 在具有关系的多个日历中查找可用性

Question

我拥有一些经常出租的东西。它由几个部分组成。它可以部分出租，也可以整体出租。如果其中一部分出租，您将无法整体出租。

一个例子是我租了一辆车。这辆车的轮胎也是出租的。您可以选择租用带轮胎的汽车（«整个»），或者只租一个轮胎。但是，如果一个或多个轮胎被租出，您将无法租车（«整个»）。

我把它想象成一个层次结构。

         Whole
    _______|_______
   |               |
 Part 1           Part 2

我为“整个”事物使用了一个 Google 日历，为每个包含的部分使用了单独的日历。这行得通，但很烦人，我希望能够向感兴趣的人发送 link - 他们可以在那里看到可用的内容。

所以我制作了一个非常简单的数据库 (mariadb 10.4)，有两个 tables:

# tbl: part
| id | parent_id | name   |

列 parent_id 简单地引用了同一 table 中的另一行，这是一个数据示例。

| 1  | NULL      | Car    |
| 2  | 1         | Tire 1 |
| 3  | 1         | Tire 2 |

然后下面table来存储日期，当每个部分是booked（+示例数据）。

# tbl: booking
| id | part_id | booked_from | booked_until |
--------------------------------------------
| 1  | 1       | 2021-07-31  | 2021-08-03   |
| 2  | 2       | 2021-08-03  | 2021-08-07   |
| 3  | 3       | 2021-08-04  | 2021-08-06   |
| 4  | 3       | 2021-08-09  | 2021-08-10   |

由此我们知道汽车本身是从 2021-07-31 - 2021-08-03 开始预订的，但它只能从 2021-08-06 开始预订，因为在此期间租出了两个轮胎（但是它们可以同时出租，因为它们不是严格相关的）。但是直到 2021-08-09 轮胎又被预订了。

我要查找的是获取可用日期列表的查询。从 pars-table 我能够找出哪些部分是相关的，这不是我最大的问题 - 我想，因为在查询可用性时可以使用这样的东西：

汽车：part_id IN(1,2,3)
轮胎 1：part_id IN(1,2)
轮胎 3：part_id IN(1,3)

我的问题是（简单地？）如何编排一个查询，该查询 return 仅包含可用日期，尤其是日期可以重叠的汽车。

例如汽车的结果

SELECT 
  `booked_until` AS `available-from`, 
  `booked_from` as `available-until`
FROM 
  booking
/** some JOIN magic? **
WHERE part_id IN(1,2,3)

例如 轮胎 1 是相同的，但 part_id IN(1,2) 因为 tire 2 (id: 3) 与 tire 1 没有直接关系。

两者应该分别return:

# car
| available-from | available-until |
------------------------------------
| NULL           | 2021-07-31      |
| 2021-08-06     | 2021-08-09      |
| 2021-08-10     | NULL            |

# tire 1
| available-from | available-until |
------------------------------------
| NULL           | 2021-07-31      |
| 2021-08-07     | NULL            |

其中 NULL 值只是表示之前或之后没有任何预订。例如，此轮胎可用 from now 直到 2021-07-31 和从 2021-08-07 直到地球存在的最后一天。

希望这是有道理的 - 并且有人能够提供帮助。

提前致谢。

Answer 1

好的，这是我的尝试。

所以，如果我理解正确的话，除了 table 中明确给出的信息之外，还有隐含的单位不可用性。所以我首先明确地检索了这个：

select unit_id, entry_start, entry_end 
  from unit_calendar_entry
union
select p.id, u.entry_start, u.entry_end
 from unit p
 join unit_calendar_entry u
   on  p.parent_unit_id = u.unit_id
union
select p.parent_unit_id as unit_id, u.entry_start, u.entry_end 
 from unit p
 join unit_calendar_entry u
   on p.id = u.unit_id
  and p.parent_unit_id is not null
order by unit_id, entry_start;

第一个 select 只是获取 table
第二个添加了汽车的条目，因为如果它的任何一个轮胎都被预订了，就可以认为它被预订了
第三个添加了轮胎条目，因为如果汽车被预订，它们可以被视为已预订

结果：

unit_id     entry_start           entry_end
----------------------------------------------------
1          2021-07-31 00:00:00   2021-08-03 00:00:00
1          2021-08-03 00:00:00   2021-08-07 00:00:00
1          2021-08-04 00:00:00   2021-08-06 00:00:00
1          2021-08-09 00:00:00   2021-08-10 00:00:00
2          2021-07-31 00:00:00   2021-08-03 00:00:00
2          2021-08-03 00:00:00   2021-08-07 00:00:00
3          2021-07-31 00:00:00   2021-08-03 00:00:00
3          2021-08-04 00:00:00   2021-08-06 00:00:00
3          2021-08-09 00:00:00   2021-08-10 00:00:00

基于此，为了对 adjacent/overlapping 时间跨度进行分组，需要解决 空隙和孤岛 问题。您可以使用两个查询来标记属于一起的条目，。如果我们调用上面的查询subtab，相应的语句是

select c.*, sum(case when prev_end < entry_start then 1 else 0 end) over (order by unit_id, entry_start) as grouping   
         from (
              select subtab.*, max(entry_end) over (partition by unit_id order by entry_start rows between unbounded preceding and 1 preceding) as prev_end 
                from subtab
              ) c

内部查询获取每一行的前一端。
外部分配一个分组 ID（在每个 unit_id 内）标识属于一个连续块（也称为岛）的所有条目。

结果：

unit_id entry_start          entry_end            prev_end             grouping
-------------------------------------------------------------------------------
1       2021-07-31 00:00:00  2021-08-03 00:00:00  (null)               0
1       2021-08-03 00:00:00  2021-08-07 00:00:00  2021-08-03 00:00:00  0
1       2021-08-04 00:00:00  2021-08-06 00:00:00  2021-08-07 00:00:00  0
1       2021-08-09 00:00:00  2021-08-10 00:00:00  2021-08-07 00:00:00  1
2       2021-07-31 00:00:00  2021-08-03 00:00:00  (null)               1
2       2021-08-03 00:00:00  2021-08-07 00:00:00  2021-08-03 00:00:00  1
3       2021-07-31 00:00:00  2021-08-03 00:00:00  (null)               1
3       2021-08-04 00:00:00  2021-08-06 00:00:00  2021-08-03 00:00:00  2
3       2021-08-09 00:00:00  2021-08-10 00:00:00  2021-08-06 00:00:00  3

由此（我们称之为 tab），您可以通过在 unit_id 和 grouping 上分组来获取不可用的时间跨度（参见或 db<> fiddle 下面）或计算空闲时间如下：

select distinct unit_id
              , NULLIF((min(ifnull(prev_end,'1000-01-01')) over (partition by unit_id, grouping)),'1000-01-01') as available_from
              , min(entry_start) over (partition by unit_id, grouping) as available_til
   from tab
union 
select distinct unit_id
                , max(entry_end) over (partition by unit_id) as available_from
                , null as available_til
 from tab
order by unit_id, available_from

对于每个 unit_id/grouping，第一个查询得到 available_from 作为 prev_end 的最小值。为了从 MIN() 获取 NULL 值，我使用了类似于 this SO answer.
第二个查询为每个 unit_id 添加一行，最大 entry_end 作为开始，NULL 作为结束

结果：

unit_id  available_from         available_til
---------------------------------------------------
1        (null)                 2021-07-31 00:00:00
1        2021-08-07 00:00:00    2021-08-09 00:00:00
1        2021-08-10 00:00:00    (null)
2        (null)                 2021-07-31 00:00:00
2        2021-08-07 00:00:00    (null)
3        (null)                 2021-07-31 00:00:00
3        2021-08-03 00:00:00    2021-08-04 00:00:00
3        2021-08-06 00:00:00    2021-08-09 00:00:00
3        2021-08-10 00:00:00    (null)

将所有内容放在一个查询中：

with tab as (
            select c.*, sum(case when prev_end < entry_start then 1 else 0 end) over (order by unit_id, entry_start) as grouping   
              from (
                   select d.*, max(entry_end) over (partition by unit_id order by entry_start rows between unbounded preceding and 1 preceding) as prev_end 
                     from (
                      select unit_id, entry_start, entry_end 
                        from unit_calendar_entry
                      union
                      select p.id, u.entry_start, u.entry_end
                       from unit p
                       join unit_calendar_entry u
                         on p.parent_unit_id = u.unit_id
                      union
                      select p.parent_unit_id as unit_id, u.entry_start, u.entry_end 
                       from unit p
                       join unit_calendar_entry u
                         on p.id = u.unit_id
                        and p.parent_unit_id is not null
                          ) d
                   ) c
            ) 
 select distinct unit_id, NULLIF((min(ifnull(prev_end,'1000-01-01')) over (partition by unit_id, grouping)),'1000-01-01') as available_from, min(entry_start) over (partition by unit_id, grouping) as available_til
   from tab
   union 
  select distinct unit_id, max(entry_end) over (partition by unit_id) as available_from, null as available_til
   from tab
 order by unit_id, available_from

另见 this db<>fiddle。

Answer 2

非常感谢 - 这对你很有帮助。

由于这个查询对我来说真的很复杂，所以我想完全理解，我想问一下

我怎样才能 select 只得到 unit_id = 1 的结果？我尝试在外部查询上使用 where 语句，但没有任何效果。当指定 WHERE unit_id IN(1,2,3) 时，return 单元 1 的结果就足够了，其中所有日历相交（这将是 db<>[= 中单元 1 的行 returned 33=]).

unit_id  available_from         available_til
---------------------------------------------------
1        (null)                 2021-07-31 00:00:00
1        2021-08-07 00:00:00    2021-08-09 00:00:00
1        2021-08-10 00:00:00    (null)

我如何限制日期范围的结果，假设在日期 2021-08-01 - 2021-08-30.

再次感谢您！

编辑：如果我在等式中添加“螺栓”（用于安装轮胎），给层次结构另一个层次会怎样。这个解决方案仍然有效吗？

                                    Whole
                               _______|_______
                              |               |
                            Part 1           Part 2
                        ______|____          ...
                        |          |
                     Bolt 1 .... Bolt N

Answer 3

好的，所以我不得不稍微扩展一下解决方案。添加了另一个 table，它为每个项目的每个条目保留特定数据。例如，什么时候需要归还，以及将 «the ting» 恢复到其原始状态所需的时间。这只是一个近似值，并不重要，但它应该限制任何人在特定时间之前来收集它。 :)

所以 tables 现在看起来像这样：

# tbl: property_unit (former: part)
| id | parent_id | identifier   |

# tbl: property_unit_calendar (NEW)
| id | property_unit_id | return_by | preparation_time |

# tbl: property_unit_calendar_entry (former: booking)
| id | calendar_id | entry_start | entry_end |

这对当前查询影响不大，因为 table property_unit_calendar 中 return_by 和 preparation_time 列中的时间应用于日期时间字段property_unit_calendar_entry 预订时。

采用了代码并添加了这个关系。它似乎工作正常-（再次非常感谢）。

已更新db<>fiddle

现在我正在努力找出我应该如何减少结果以匹配我正在检查可用性的实际 unit。我应该为每个 tab-选择添加吗？像这样

FROM tab
WHERE property_unit = 8
UNION
SELECT DISTINCT
  ...
FROM tab
WHERE property_unit = 8

我苦苦挣扎的另一件事是如何将结果减少到特定帧，例如在日期之间、从某个日期开始或直到某个日期。

主要问题不是如何获得结果 between/from/until 日期，但如果预订提前于时间范围之前，它会在第一个和最后一个中给我 NULL行 - 这应该表明它可用于“unpredictable past/future”（这可能不正确）。

所以....是否最好为 NULL 添加一个额外的查询以检查帧之前或之后是否有预订？

希望我解释得足够好。需要考虑这么多依赖项！

谢谢！

使用 MySQL 在具有关系的多个日历中查找可用性

Find availability in multiple calendars with relation using MySQL

mysql

sql

database

gaps-and-islands

mariadb-10.4