根据可用时间段/日历将人们分组的算法

Question

我生成了一个样本数据集，其中包含 1000 人以及他们白天的可用时间段。每个时隙是一天中 30 分钟的间隔。 0 表示他们在该时隙内空闲，1 表示他们忙。

例如：

| Time        | Sally | Mark | Nish |
| ------------| ----- | ---- | ---- |
| 0900 - 0930 |   0   |  1   |   1  |
| 0930 - 1000 |   1   |  0   |   1  |
| 1000 - 1030 |   1   |  1   |   1  |
| 1030 - 1100 |   1   |  0   |   1  |
| 1100 - 1130 |   0   |  1   |   1  |
| 1200 - 1230 |   1   |  0   |   0  |

我想创建最多 5 人的小组，这些小组至少有一个共同的可用时间段。每个组应该是相互排斥的。我想最大限度地增加创建的成功组的数量。

目前，我使用的是一个非常粗糙的算法。我对 5 个人的数据集进行了采样，然后检查他们是否有共同的可用时间段。如果他们这样做，那么我将它们从数据集中删除并重复该过程。如果他们没有可用的公共时间段，我会重新抽样另外 5 个人并继续尝试，直到找到 5 个具有公共时间段的样本。如果在 1000 次重采样后，它无法找到满足其停止条件的 5 个样本。

这对我来说似乎效率很低，我想知道是否有更好的方法来做到这一点。

Answer 1

是的，您的 Monte Carlo 解决方案非常耗时。根据可用插槽的密度，它可能存在致命缺陷。

相反，构造个可行集。对于每个槽，建立一组所有可用的人。现在，您需要做的就是遍历该集合的所有 5 成员子集。对每个时隙重复此操作。您现在有所有组五名成员，至少有一个共同的时间段。

Answer 2

我会根据以下伪代码：

For every timeslot count the number of available people.
Sort the timeslot in ascending order of available people.

For every person, count the number of available timeslots.
Sort the people in ascending order of available timeslots.

While people are available:
    Enumerate the sorted list of timeslots:
        Repeat for the timeslot as long as 5+ people are available
            Collect five people available for this timeslot
            Remove them as one group from the list of people.
            Decrease the avaiblability count of the timeslot. 
    Break the while loop if no group was formed during the iteration

排序背后的基本原理是首先使用稀有的时间段。首先分配选择很少的人。这给拥挤的时段和人们留下了很多选择为最后几轮，从而扩大更多组的机会。

根据可用时间段/日历将人们分组的算法

Algorithm to group people together based on their available timeslots/ calendar

python

algorithm

grouping

matching