这个 CP-SAT 模型可以更快吗？

Question

我的团队正在构建一个 CP-SAT 求解器，它可以在几天内安排作业（想想家庭作业），可用性可变（可用于完成作业的时间）。我们正在努力加快我们的模型。

我们已尝试 num_search_workers 和其他参数调整，但想检查其他速度提升情况。目标是在 5-10 秒内解决约 100 天和最多 2000 个作业的问题（以 M1 mac 为基准）。有什么想法吗？

问题描述：根据这些要求在 d 天内布置作业

一天的分配时间不得超过当天可用的时间
应遵守分配依赖关系（如果 A 需要 B，则 B 不应出现在 A 之后）
可以拆分作业（以便更好地适应时间很少的日子）
针对一天中分配类型的多样性进行优化

# 天和 # 作业的求解速度显着降低。这是预期的，但我们想知道您是否可以建议可能的加速

这是一个示例单元测试。希望显示拆分、排序和时间限制。

days = [{"secondsAvailable": 1200}, {"secondsAvailable": 1200}, {"secondsAvailable": 1200}, {"secondsAvailable": 1200}]
assignments = [
    {"id": 1, "resourceType": "Type0", "seconds": 2400, "deps": [], "instances": 2},
    {"id": 2, "resourceType": "Type0", "seconds": 1200, "deps": [1], "instances": 1},
    {"id": 3, "resourceType": "Type0", "seconds": 1200, "deps": [1, 2], "instances": 1},
    ]
result = cp_sat.CP_SAT_FAST.schedule(days, assignments, options=solver_options)
# expect a list of lists where each inner list is a day with the included assignments
expected = shared.SolverOutput(feasible=True, solution=[
    [{"id": 1, "resourceType": "Type0", "time": 1200, "instances": 2}],
    [{"id": 1, "resourceType": "Type0", "time": 1200, "instances": 2}],
    [{"id": 2, "resourceType": "Type0", "time": 1200, "instances": 1}],
    [{"id": 3, "resourceType": "Type0", "time": 1200, "instances": 1}],
    ])
self.assertEqual(result, expected)

这是求解器：

import math
from typing import List, Dict

from ortools.sat.python import cp_model
import numpy as np

import planner.solvers as solvers
from planner.shared import SolverOutput, SolverOptions


class CP_SAT_FAST(solvers.Solver):
    """
    CP_SAT_FAST is a CP_SAT solver with speed optimizations and a time limit (passed in through options).
    """

    @staticmethod
    def schedule(days: List[Dict], assignments: List[Dict], options: SolverOptions) -> SolverOutput:
        """
        Schedules a list of assignments on a studyplan of days

        Arguments:
        days: list of dicts containing available time for that day
        assignments: list of assignments to place on schedule
        """

        model = cp_model.CpModel()

        num_assignments = len(assignments)
        num_days = len(days)

        # x[d, a] shows is assignment a is on day d
        x = np.zeros((num_days, num_assignments), cp_model.IntVar) 

        # used for resource diversity optimization
        total_resource_types = 4
        unique_today = []

        # upper and lower bounds used for dependency ordering (if a needs b then b must be before or on the day of a)
        day_ub = {}
        day_lb = {}

        # track assignment splitting
        instances = {}
        assignment_times = {}

        id_to_assignment = {}
        for a, asm in enumerate(assignments):

            # track upper and lower bounds
            day_ub[a] = model.NewIntVar(0, num_days, "day_ub")
            day_lb[a] = model.NewIntVar(0, num_days, "day_lb")
            asm["ub"] = day_ub[a]
            asm["lb"] = day_lb[a]
            id_to_assignment[asm["id"]] = asm

            max_instances = min(num_days, asm.get("instances", num_days))
            
            # each assignment must occur at least once
            instances[a] = model.NewIntVar(1, max_instances, f"instances_{a}")
            model.AddHint(instances[a], max_instances)

            # when split keep a decision variable of assignment time
            assignment_times[a] = model.NewIntVar(asm.get("seconds") // max_instances, asm.get("seconds"), f"assignment_time_{a}")
            model.AddDivisionEquality(assignment_times[a], asm.get("seconds"), instances[a])  

        for d in range(num_days):

            time_available = days[d].get("secondsAvailable", 0)
            if time_available <= 0:
                # no assignments on zero-time days
                model.Add(sum(x[d]) == 0)

            else:
                
                # track resource diversity on this day
                type0_today = model.NewBoolVar(f"type0_on_{d}")
                type1_today = model.NewBoolVar(f"type1_on_{d}")
                type2_today = model.NewBoolVar(f"type2_on_{d}")
                type3_today = model.NewBoolVar(f"type3_on_{d}")
                types_today = model.NewIntVar(0, total_resource_types, f"unique_on_{d}")
                
                task_times = []

                for a, asm in enumerate(assignments):

                    # x[d, a] = True if assignment a is on day d
                    x[d, a] = model.NewBoolVar(f"x[{d},{a}]")
                    
                    # set assignment upper and lower bounds for ordering
                    model.Add(day_ub[a] >= d).OnlyEnforceIf(x[d, a])
                    model.Add(day_lb[a] >= (num_days - d)).OnlyEnforceIf(x[d, a])
                    
                    # track if a resource type is on a day for resource diversity optimization
                    resourceType = asm.get("resourceType")
                    if resourceType == "Type0":
                        model.AddImplication(x[d, a], type0_today)
                    elif resourceType == "Type1":
                        model.AddImplication(x[d, a], type1_today)
                    elif resourceType == "Type2":
                        model.AddImplication(x[d, a], type2_today)
                    elif resourceType == "Type3":
                        model.AddImplication(x[d, a], type3_today)
                    else:
                        raise RuntimeError(f"Unknown resource type {asm.get('resourceType')}")

                    # track of task time (considering splitting), for workload requirements
                    task_times.append(model.NewIntVar(0, asm.get("seconds"), f"time_{a}_on_{d}"))
                    model.Add(task_times[a] == assignment_times[a]).OnlyEnforceIf(x[d, a])

                # time assigned to day d cannot exceed the day's available time
                model.Add(time_available >= sum(task_times))

                # sum the unique resource types on this day for later optimization
                model.Add(sum([type0_today, type1_today, type2_today, type3_today]) == types_today)
                unique_today.append(types_today)


        """
        Resource Diversity:

        Keeps track of what instances of a resource type appear on each day
        and the minimum number of unique resource types on any day. (done above ^)
        
        Then the model objective is set to maximize that minimum
        """
        total_diversity = model.NewIntVar(0, num_days * total_resource_types, "total_diversity")
        model.Add(sum(unique_today) == total_diversity)

        avg_diversity = model.NewIntVar(0, total_resource_types, "avg_diversity")
        model.AddDivisionEquality(avg_diversity, total_diversity, num_days)

        # Set objective
        model.Maximize(avg_diversity)


        # Assignment Occurance/Splitting and Dependencies
        for a, asm in enumerate(assignments):
            
            # track how many times an assignment occurs (since we can split)
            model.Add(instances[a] == sum(x[d, a] for d in range(num_days))) 

            # Dependencies 
            for needed_asm in asm.get("deps", []):
                needed_ub = id_to_assignment[needed_asm]["ub"]
                
                # this asm's lower bound must be greater than or equal to the upper bound of the dependency
                model.Add(num_days - asm["lb"] >= needed_ub)

        # Solve
        solver = cp_model.CpSolver()

        # set time limit
        solver.parameters.max_time_in_seconds = float(options.time_limit)
        solver.parameters.preferred_variable_order = 1
        solver.parameters.initial_polarity = 0
        # solver.parameters.stop_after_first_solution = True
        # solver.parameters.num_search_workers = 8

        intermediate_printer = SolutionPrinter()
        status = solver.Solve(model, intermediate_printer)


        print("\nStats")
        print(f"  - conflicts       : {solver.NumConflicts()}")
        print(f"  - branches        : {solver.NumBranches()}")
        print(f"  - wall time       : {solver.WallTime()}s")
        print()

        if status == cp_model.OPTIMAL or status == cp_model.FEASIBLE:
            sp = []

            for i, d in enumerate(days):
                day_time = 0
                days_tasks = []
                for a, asm in enumerate(assignments):
                    if solver.Value(x[i, a]) >= 1:
                        asm_time = math.ceil(asm.get("seconds") / solver.Value(instances[a]))
                        day_time += asm_time

                        days_tasks.append({"id": asm["id"], "resourceType": asm.get("resourceType"), "time": asm_time, "instances": solver.Value(instances[a])})
                
                sp.append(days_tasks)

            return SolverOutput(feasible=True, solution=sp)
            
        else:
            return SolverOutput(feasible=False, solution=[])


class SolutionPrinter(cp_model.CpSolverSolutionCallback):

    def __init__(self):
        cp_model.CpSolverSolutionCallback.__init__(self)
        self.__solution_count = 0

    def on_solution_callback(self):
        print(f"Solution {self.__solution_count} objective value = {self.ObjectiveValue()}")
        self.__solution_count += 1

Answer 1

在回答您的实际问题之前，我想指出您模型中的一些我怀疑没有按您预期工作的地方。

给定日期的分配类型限制

model.AddImplication(x[d, a], type0_today)

等，如果那天有那种类型的分配，一定要执行type0_today == 1。但是，如果那天没有那种类型的分配，它不会强制执行 type0_today == 0。求解器仍然可以自由选择 type0_today == 1，它会这样做，因为这满足了这个约束，也直接增加了 objective 功能。您可能会发现在您给出的测试用例的最佳解决方案中，所有 type0_today 到 type3_today 变量都是 1 并且 avg_diversity == 4 在最佳解决方案中，即使没有分配输入数据中除 0 以外的任何类型。在建模的早期阶段，检查模型中所有变量值的合理性始终是个好主意。

由于我没有 Python 安装，我将您的模型翻译成 c# 以便能够做一些实验。抱歉，您必须翻译成等效的 Python 代码。我重新制定了对 type0_today 变量的约束，以使用数组 type_today[d, t]（对于天 d 和类型 t）并使用 AddMaxEquality 约束，这对于布尔值variables 相当于所有参与变量的逻辑或：

    // For each day...
    for (int d = 0; d < num_days; d++)
        {
            // ... make a list for each assignment type of all x[d, a] where a has that type.
            List<IntVar>[] assignmentsByType = new List<IntVar>[total_resource_types];
            for (int t = 0; t < total_resource_types; t++)
            {
                assignmentsByType[t] = new List<IntVar>();
            }
            for (int a = 0; a < num_assignments; a++)
            {
                int t = getType(assignments[a].resourceType);
                assignmentsByType[t].Add(x[d, a]);
            }
            // Constrain the types present on the day to be the logical OR of assignments with that type on that day
            for (int t = 0; t < total_resource_types; t++)
            {
                if (assignmentsByType[t].Count > 0)
                {
                    model.AddMaxEquality(type_today[d, t], assignmentsByType[t]); 
                }
                else
                {
                    model.Add(type_today[d, t] == 0);
                }

            }
        }

您将平均多样性计算为

        avg_diversity = model.NewIntVar(0, total_resource_types, "avg_diversity")
        model.AddDivisionEquality(avg_diversity, total_diversity, num_days)

由于求解器仅适用于整数变量，因此 avg_diversity 将恰好是值 0、1、2、3 或 4 之一，没有小数部分。约束 AddDivisionEquality 还将确保 total_diversity 是 avg_diversity 和 num_days 的 精确整数倍数 。这是对解决方案的非常严格的限制，并且在我认为您不希望的许多情况下会导致不可行。

例如，avg_diversity == 3、num_days == 20 和 total_diversity == 60 是允许的解决方案，但 total_diversity == 63 是不允许的，尽管该解决方案中有三天具有比 total_diversity == 60.

更高的多样性

相反，我建议您删除变量 avg_diversity 及其约束，并简单地使用 total_diversity 作为您的 objective 函数。由于天数在求解过程中是一个固定常数，因此在不引入人为不可行性的情况下，最大化总多样性将是等效的。

也就是说，这是我的答案。

一般的约束满足问题是一般的 NP 问题，不应期望能很好地扩展。尽管实际上可以快速解决许多特定问题的公式，但输入数据或公式的微小变化可能会将问题推入指数黑洞。除了尝试各种方法以查看哪种方法最适合您的确切问题之外，真的没有其他方法。

虽然这听起来很矛盾，但求解器更容易为强约束问题找到最优解，而不是轻约束问题（假设它们是可行的！）。强约束问题中的搜索 space 比轻约束问题中的搜索要小，因此求解器在试验优化方面的选择较少，因此可以更快地完成工作。

第一个建议

在您的问题中，每个赋值都有变量 day_ub 和 day_lb。它们的范围从 0 到 num_days。对他们的约束

                    model.Add(day_ub[a] >= d).OnlyEnforceIf(x[d, a])
                    model.Add(day_lb[a] >= (num_days - d)).OnlyEnforceIf(x[d, a])

允许求解器自由选择 0 和最大 d 之间的任何值。最大的(num_days - d)（含）。在优化过程中，求解器可能会花时间为这些变量尝试不同的值，但很少发现它会带来改进；只有当依赖分配的位置发生变化时才会发生这种情况。

您可以消除变量 day_ub 和 day_lb 及其约束，而是直接用 x 变量制定依赖关系。

在我的 C# 模型中，我重新表述了赋值依赖约束，如下所示：

for (int a = 0; a < num_assignments; a++)
            {
                Assignment assignment = assignments[a];
                foreach (int predecessorIndex in getPredecessorAssignmentIndicesFor(assignment))
                {
                    for (int d1 = 0; d1 < num_days; d1++)
                    {
                        for (int d2 = 0; d2 < d1; d2++)
                        {
                            model.AddImplication(x[d1, predecessorIndex], x[d2, a].Not());
                        }
                    }
                }
            }

换句话说：如果分配A（a）所依赖的分配B（predecessorIndex）被放置在第d1天，那么所有x[0..d1, a]必须是假的。这直接使用 x 变量关联依赖关系，而不是引入具有额外自由度的帮助变量，这会使求解器陷入困境。此更改减少了问题中的变量数量并增加了约束数量，这两者都有助于求解器。

在我用 25 天和 35 个作业进行的实验中，检查了显示的模型统计数据

原文：

#Variables: 2020
#kIntDiv:   35
#kIntMax:   100
#kLinear1:  1750
#kLinear2:  914
#kLinearN:  86
Total constraints   2885

新配方：

#Variables: 1950
#kBoolOr:   11700
#kIntDiv:   35
#kIntMax:   100
#kLinear2:  875
#kLinearN:  86
Total constraints   12796

因此新公式的变量更少但限制更多。

改进了实验中的求解时间，求解器只用了 2.6 秒就达到了 total_diversity == 68 而不是超过 90 秒。

原配方

Time    Objective
0,21    56
0,53    59
0,6 60
0,73    61
0,75    62
0,77    63
2,9 64
3,82    65
3,84    66
91,01   67
91,03   68
91,05   69

新配方

Time    Objective
0,2347  41
0,3066  42
0,4252  43
0,4602  44
0,5014  49
0,6437  50
0,6777  51
0,6948  52
0,7108  53
0,9593  54
1,0178  55
1,1535  56
1,2023  57
1,2351  58
1,2595  59
1,2874  60
1,3097  61
1,3325  62
1,388   63
1,5698  64
2,4948  65
2,5993  66
2,6198  67
2,6431  68
32,5665 69

当然，您获得的解决方案时间将在很大程度上取决于输入数据。

第二条建议

在我的实验过程中，我观察到当分配具有很多依赖性时，可以更快地找到解决方案。这与更容易求解的更高约束模型是一致的。

如果您经常有相同类型和持续时间的分配（例如测试数据中的数字 2 和 3）并且它们都有实例 == 1` 并且没有依赖关系或相同的依赖关系，那么交换它们的位置在解决方案中不会改善 objective.

在预处理步骤中，您可以查找此类重复项并使其中一个依赖于另一个。这本质上是一个对称破坏约束。这将防止求解器浪费时间尝试查看交换位置是否会改善 objective.

第三条建议

解决方案需要确定每个分配的实例将出现在解决方案中的数量。每个赋值需要两个变量 instances[a] 和 assignment_times[a] 以及相关约束。

您可以去掉变量 instances[a] 和 assignment_times[a]，而不是在预处理步骤中将具有 instances > 1 的赋值拆分为多个赋值，而不是这样做。例如，在您的测试数据中，作业 1 将分为两个作业 1_1 和 1_2，每个作业都有 instances == 1 和 seconds = 1200。对于分配 1 instances == 2 的测试用例，这不会对最终解决方案产生任何影响——也许求解器会在同一天安排 1_1 和 1_2，也许不会，但最终结果等同于分裂与否，但不需要额外的变量。

预处理步骤中，拆分assignment时，需要加入对称破缺约束，使1_2依赖于1_1等，原因如前所述

当一个assignment有instances > 2的时候，在运行之前拆分成多个assignment实际上是对模型的一种改变。例如，如果 instances == 3 和 seconds = 2400 你不能得到一个解决方案，其中分配被分成两天，每天 1200 秒；求解器将始终安排 3 个分配，每个分配 800 秒。

所以这个建议实际上是对模型的更改，您必须确定是否可以接受。

放置更多的分配实例通常会有助于总体多样性，因此更改可能不会产生很大的实际影响。它还允许在一天安排 2/3 的作业，在另一天安排剩余的 1/3，因此它甚至增加了一些灵活性。

但就您的整体要求而言，这可能会或可能不会被接受。

在所有情况下，您都必须使用您的确切数据测试更改，看看它们是否真的会带来改进。

我希望这会有所帮助（而且这是一个现实世界的问题，而不是家庭作业，因为我确实花了几个小时调查...）。

这个 CP-SAT 模型可以更快吗？

Could this CP-SAT model be faster?

python

operations-research

or-tools