通过并行分组最小化作业调度
Job scheduling with minimization by parallel grouping
我有一个带有扭曲的作业调度问题 - 最小化约束。任务是-我有很多工作,每个工作都对其他工作有不同的依赖性,没有周期。这些工作也是有分类的,属于同一分类的可以免费并列运行。所以,我想对作业进行排序,以便每个作业都在其依赖项之后出现,但是 运行 以这样一种方式将它们按类别分组(运行 许多并行)以最小化数量系列工作我 运行。也就是说,同一类别的相邻作业算作一个连续作业。
我知道我可以按拓扑排序来处理依赖项排序。我试过在包含每个工作类别的子图上使用图形着色,但我 运行 遇到了类别间依赖冲突的问题。更具体地说,当我必须决定将两对或多对工作中的哪对分组时。我可以暴力破解,我可以尝试 运行dom 遍历搜索 space,但我希望有更智能的东西。前者在最坏的情况下呈指数级增长,后者不一定是最优的。
按比例计算 - 一次可以安排多达几十万个工作,可能有几百个工作类别。
我偶然发现了许多优化,例如创建依赖关系图、拆分成连接的组件以及独立解决每个子问题并合并。我还意识到每个类别的颜色数量都有一个下限,但不确定如何在提前退出条件之外使用它。
是否有更好的方法来找到排序或作业以最大化类别作业的这种“分组”,以最小化串行作业的总数?
不确定这是否有帮助,但与其针对算法,还可以开发优化模型并让求解器完成工作。
混合整数规划模型可能如下所示:
我们的想法是最小化总完工时间,即最新作业的完成时间。这将自动尝试将同一类别的作业组合在一起(以允许并行处理)。
我为 50 个职位和 5 个类别创建了一些随机数据。数据集包括一些截止日期和一些优先约束。
---- 28 SET j jobs
job1 , job2 , job3 , job4 , job5 , job6 , job7 , job8 , job9 , job10, job11, job12
job13, job14, job15, job16, job17, job18, job19, job20, job21, job22, job23, job24
job25, job26, job27, job28, job29, job30, job31, job32, job33, job34, job35, job36
job37, job38, job39, job40, job41, job42, job43, job44, job45, job46, job47, job48
job49, job50
---- 28 SET c category
cat1, cat2, cat3, cat4, cat5
---- 28 SET jc job-category mapping
cat1 cat2 cat3 cat4 cat5
job1 YES
job2 YES
job3 YES
job4 YES
job5 YES
job6 YES
job7 YES
job8 YES
job9 YES
job10 YES
job11 YES
job12 YES
job13 YES
job14 YES
job15 YES
job16 YES
job17 YES
job18 YES
job19 YES
job20 YES
job21 YES
job22 YES
job23 YES
job24 YES
job25 YES
job26 YES
job27 YES
job28 YES
job29 YES
job30 YES
job31 YES
job32 YES
job33 YES
job34 YES
job35 YES
job36 YES
job37 YES
job38 YES
job39 YES
job40 YES
job41 YES
job42 YES
job43 YES
job44 YES
job45 YES
job46 YES
job47 YES
job48 YES
job49 YES
job50 YES
---- 28 PARAMETER length job duration
job1 11.611, job2 12.558, job3 11.274, job4 7.839, job5 5.864, job6 6.025, job7 11.413
job8 10.453, job9 5.315, job10 12.924, job11 5.728, job12 6.757, job13 10.256, job14 12.502
job15 6.781, job16 5.341, job17 10.851, job18 11.212, job19 8.894, job20 8.587, job21 7.430
job22 7.464, job23 6.305, job24 14.334, job25 8.799, job26 12.834, job27 8.000, job28 6.255
job29 12.489, job30 5.692, job31 7.020, job32 5.051, job33 7.696, job34 9.999, job35 6.513
job36 6.742, job37 8.306, job38 8.169, job39 8.221, job40 14.640, job41 14.936, job42 8.699
job43 8.729, job44 12.720, job45 8.967, job46 14.131, job47 6.196, job48 12.355, job49 5.554
job50 10.763
---- 28 SET before dependencies
job3 job9 job13 job21 job23 job27 job32 job41 job42
job1 YES
job3 YES
job4 YES
job8 YES
job9 YES YES
job12 YES
job14 YES
job21 YES
job26 YES
job31 YES
+ job43 job46 job48
job10 YES YES
job11 YES
---- 28 PARAMETER due some jobs have a due date
job16 50.756, job19 57.757, job20 58.797, job25 74.443, job29 65.605, job32 55.928, job50 58.012
解决方案可能如下所示:
此模型(使用此特定数据集)在大约 30 秒内求解(使用 Cplex)。当然要注意的是,一般来说,这些模型可能很难求解到最优。
对于作业调度,我鼓励您查看 CPLEX 中的 CPOptimizer introduction to CPOptimizer
基本的工作车间模型如下所示
using CP;
int nbJobs = ...;
int nbMchs = ...;
range Jobs = 0..nbJobs-1;
range Mchs = 0..nbMchs-1;
// Mchs is used both to index machines and operation position in job
tuple Operation {
int mch; // Machine
int pt; // Processing time
};
Operation Ops[j in Jobs][m in Mchs] = ...;
dvar interval itvs[j in Jobs][o in Mchs] size Ops[j][o].pt;
dvar sequence mchs[m in Mchs] in all(j in Jobs, o in Mchs : Ops[j][o].mch == m) itvs[j][o];
minimize max(j in Jobs) endOf(itvs[j][nbMchs-1]);
subject to {
forall (m in Mchs)
noOverlap(mchs[m]);
forall (j in Jobs, o in 0..nbMchs-2)
endBeforeStart(itvs[j][o], itvs[j][o+1]);
}
如 sched_jobshop 示例所示
这是一个 CP Optimizer 模型,它使用最新的 12.10 版本(几秒钟)求解得非常快。
该模型非常自然地使用优先约束和 "state function" 来模拟批处理约束(不同类别的两个任务不能同时执行)。
DURATION = [
11611, 12558, 11274, 7839, 5864, 6025, 11413, 10453, 5315, 12924,
5728, 6757, 10256, 12502, 6781, 5341, 10851, 11212, 8894, 8587,
7430, 7464, 6305, 14334, 8799, 12834, 8000, 6255, 12489, 5692,
7020, 5051, 7696, 9999, 6513, 6742, 8306, 8169, 8221, 14640,
14936, 8699, 8729, 12720, 8967, 14131, 6196, 12355, 5554, 10763
]
CATEGORY = [
1, 5, 3, 2, 2, 2, 2, 5, 1, 3,
5, 3, 5, 4, 1, 4, 1, 2, 4, 3,
2, 2, 1, 1, 3, 5, 2, 4, 4, 2,
1, 3, 1, 5, 2, 2, 3, 4, 4, 3,
3, 1, 2, 1, 2, 1, 4, 3, 4, 2
]
PREC = [
(0, 2), (2, 8), (3, 12), (7, 26), (8, 20), (8, 22), (11, 22),
(13, 40), (20, 26), (25, 41), (30, 31), (9, 45), (9, 47), (10, 42)
]
DEADLINE = [ (15, 50756), (18, 57757), (19, 58797),
(24, 74443), (28, 65605), (31, 55928), (49, 58012) ]
assert(len(CATEGORY) == len(DURATION))
# ===========================================================================
from docplex.cp.model import CpoModel
mdl = CpoModel()
TASKS = range(len(DURATION))
# Decision variables - interval variables with duration (length) and name
itv = [
mdl.interval_var(length=DURATION[j], name="ITV_{}".format(j+1))
for j in TASKS
]
# Deadlines - constrain the end of the interval.
for j,d in DEADLINE :
mdl.add(mdl.end_of(itv[j]) <= d)
# Precedences - use end_before_start
for b, a in PREC :
mdl.add(mdl.end_before_start(itv[b], itv[a]))
# Batching. This uses a "state function" which is an unknown function of
# time which needs to be decided by CP Optimizer. We say that this function
# must take the value of the category of the interval during the interval
# (using always_equal meaning the state function is always equal to a value
# over the extent of the interval). This means that only tasks of a particular
# category can execute at the same time.
r = mdl.state_function()
for j in TASKS :
mdl.add(mdl.always_equal(r, itv[j], CATEGORY[j]))
# Objective. Minimize the latest task end.
makespan = mdl.max(mdl.end_of(itv[j]) for j in TASKS)
mdl.add(mdl.minimize(makespan))
# Solve it, making sure we get the absolute optimal (0 tolerance)
# and limiting the log a bit. 's' contains the solution.
s = mdl.solve(OptimalityTolerance=0, LogVerbosity="Terse")
# Get the final makespan
sol_makespan = s.get_objective_values()[0]
# Print the solution by zone
# s[X] gets the value of unknown X in the solution s
# s[r] gets the value of the state function in the solution
# this is a list of triples (start, end, value) representing
# the full extent of the state function over the whole time line.
zones = s[r]
# Iterate over the zones, ignoring the first and last ones, which
# are the zones before the first and after the last task.
for (start, end, value) in zones[1:-1] :
print("Category is {} in window [{},{})".format(value, start, end))
for j in TASKS:
(istart, iend, ilength) = s[itv[j]] # intervals are start/end/length
if istart >= start and iend <= end:
print("\t{} @ {} -- {} --> {}".format(
itv[j].get_name(), istart, ilength, iend))
我有一个带有扭曲的作业调度问题 - 最小化约束。任务是-我有很多工作,每个工作都对其他工作有不同的依赖性,没有周期。这些工作也是有分类的,属于同一分类的可以免费并列运行。所以,我想对作业进行排序,以便每个作业都在其依赖项之后出现,但是 运行 以这样一种方式将它们按类别分组(运行 许多并行)以最小化数量系列工作我 运行。也就是说,同一类别的相邻作业算作一个连续作业。
我知道我可以按拓扑排序来处理依赖项排序。我试过在包含每个工作类别的子图上使用图形着色,但我 运行 遇到了类别间依赖冲突的问题。更具体地说,当我必须决定将两对或多对工作中的哪对分组时。我可以暴力破解,我可以尝试 运行dom 遍历搜索 space,但我希望有更智能的东西。前者在最坏的情况下呈指数级增长,后者不一定是最优的。
按比例计算 - 一次可以安排多达几十万个工作,可能有几百个工作类别。
我偶然发现了许多优化,例如创建依赖关系图、拆分成连接的组件以及独立解决每个子问题并合并。我还意识到每个类别的颜色数量都有一个下限,但不确定如何在提前退出条件之外使用它。
是否有更好的方法来找到排序或作业以最大化类别作业的这种“分组”,以最小化串行作业的总数?
不确定这是否有帮助,但与其针对算法,还可以开发优化模型并让求解器完成工作。
混合整数规划模型可能如下所示:
我们的想法是最小化总完工时间,即最新作业的完成时间。这将自动尝试将同一类别的作业组合在一起(以允许并行处理)。
我为 50 个职位和 5 个类别创建了一些随机数据。数据集包括一些截止日期和一些优先约束。
---- 28 SET j jobs
job1 , job2 , job3 , job4 , job5 , job6 , job7 , job8 , job9 , job10, job11, job12
job13, job14, job15, job16, job17, job18, job19, job20, job21, job22, job23, job24
job25, job26, job27, job28, job29, job30, job31, job32, job33, job34, job35, job36
job37, job38, job39, job40, job41, job42, job43, job44, job45, job46, job47, job48
job49, job50
---- 28 SET c category
cat1, cat2, cat3, cat4, cat5
---- 28 SET jc job-category mapping
cat1 cat2 cat3 cat4 cat5
job1 YES
job2 YES
job3 YES
job4 YES
job5 YES
job6 YES
job7 YES
job8 YES
job9 YES
job10 YES
job11 YES
job12 YES
job13 YES
job14 YES
job15 YES
job16 YES
job17 YES
job18 YES
job19 YES
job20 YES
job21 YES
job22 YES
job23 YES
job24 YES
job25 YES
job26 YES
job27 YES
job28 YES
job29 YES
job30 YES
job31 YES
job32 YES
job33 YES
job34 YES
job35 YES
job36 YES
job37 YES
job38 YES
job39 YES
job40 YES
job41 YES
job42 YES
job43 YES
job44 YES
job45 YES
job46 YES
job47 YES
job48 YES
job49 YES
job50 YES
---- 28 PARAMETER length job duration
job1 11.611, job2 12.558, job3 11.274, job4 7.839, job5 5.864, job6 6.025, job7 11.413
job8 10.453, job9 5.315, job10 12.924, job11 5.728, job12 6.757, job13 10.256, job14 12.502
job15 6.781, job16 5.341, job17 10.851, job18 11.212, job19 8.894, job20 8.587, job21 7.430
job22 7.464, job23 6.305, job24 14.334, job25 8.799, job26 12.834, job27 8.000, job28 6.255
job29 12.489, job30 5.692, job31 7.020, job32 5.051, job33 7.696, job34 9.999, job35 6.513
job36 6.742, job37 8.306, job38 8.169, job39 8.221, job40 14.640, job41 14.936, job42 8.699
job43 8.729, job44 12.720, job45 8.967, job46 14.131, job47 6.196, job48 12.355, job49 5.554
job50 10.763
---- 28 SET before dependencies
job3 job9 job13 job21 job23 job27 job32 job41 job42
job1 YES
job3 YES
job4 YES
job8 YES
job9 YES YES
job12 YES
job14 YES
job21 YES
job26 YES
job31 YES
+ job43 job46 job48
job10 YES YES
job11 YES
---- 28 PARAMETER due some jobs have a due date
job16 50.756, job19 57.757, job20 58.797, job25 74.443, job29 65.605, job32 55.928, job50 58.012
解决方案可能如下所示:
此模型(使用此特定数据集)在大约 30 秒内求解(使用 Cplex)。当然要注意的是,一般来说,这些模型可能很难求解到最优。
对于作业调度,我鼓励您查看 CPLEX 中的 CPOptimizer introduction to CPOptimizer
基本的工作车间模型如下所示
using CP;
int nbJobs = ...;
int nbMchs = ...;
range Jobs = 0..nbJobs-1;
range Mchs = 0..nbMchs-1;
// Mchs is used both to index machines and operation position in job
tuple Operation {
int mch; // Machine
int pt; // Processing time
};
Operation Ops[j in Jobs][m in Mchs] = ...;
dvar interval itvs[j in Jobs][o in Mchs] size Ops[j][o].pt;
dvar sequence mchs[m in Mchs] in all(j in Jobs, o in Mchs : Ops[j][o].mch == m) itvs[j][o];
minimize max(j in Jobs) endOf(itvs[j][nbMchs-1]);
subject to {
forall (m in Mchs)
noOverlap(mchs[m]);
forall (j in Jobs, o in 0..nbMchs-2)
endBeforeStart(itvs[j][o], itvs[j][o+1]);
}
如 sched_jobshop 示例所示
这是一个 CP Optimizer 模型,它使用最新的 12.10 版本(几秒钟)求解得非常快。 该模型非常自然地使用优先约束和 "state function" 来模拟批处理约束(不同类别的两个任务不能同时执行)。
DURATION = [
11611, 12558, 11274, 7839, 5864, 6025, 11413, 10453, 5315, 12924,
5728, 6757, 10256, 12502, 6781, 5341, 10851, 11212, 8894, 8587,
7430, 7464, 6305, 14334, 8799, 12834, 8000, 6255, 12489, 5692,
7020, 5051, 7696, 9999, 6513, 6742, 8306, 8169, 8221, 14640,
14936, 8699, 8729, 12720, 8967, 14131, 6196, 12355, 5554, 10763
]
CATEGORY = [
1, 5, 3, 2, 2, 2, 2, 5, 1, 3,
5, 3, 5, 4, 1, 4, 1, 2, 4, 3,
2, 2, 1, 1, 3, 5, 2, 4, 4, 2,
1, 3, 1, 5, 2, 2, 3, 4, 4, 3,
3, 1, 2, 1, 2, 1, 4, 3, 4, 2
]
PREC = [
(0, 2), (2, 8), (3, 12), (7, 26), (8, 20), (8, 22), (11, 22),
(13, 40), (20, 26), (25, 41), (30, 31), (9, 45), (9, 47), (10, 42)
]
DEADLINE = [ (15, 50756), (18, 57757), (19, 58797),
(24, 74443), (28, 65605), (31, 55928), (49, 58012) ]
assert(len(CATEGORY) == len(DURATION))
# ===========================================================================
from docplex.cp.model import CpoModel
mdl = CpoModel()
TASKS = range(len(DURATION))
# Decision variables - interval variables with duration (length) and name
itv = [
mdl.interval_var(length=DURATION[j], name="ITV_{}".format(j+1))
for j in TASKS
]
# Deadlines - constrain the end of the interval.
for j,d in DEADLINE :
mdl.add(mdl.end_of(itv[j]) <= d)
# Precedences - use end_before_start
for b, a in PREC :
mdl.add(mdl.end_before_start(itv[b], itv[a]))
# Batching. This uses a "state function" which is an unknown function of
# time which needs to be decided by CP Optimizer. We say that this function
# must take the value of the category of the interval during the interval
# (using always_equal meaning the state function is always equal to a value
# over the extent of the interval). This means that only tasks of a particular
# category can execute at the same time.
r = mdl.state_function()
for j in TASKS :
mdl.add(mdl.always_equal(r, itv[j], CATEGORY[j]))
# Objective. Minimize the latest task end.
makespan = mdl.max(mdl.end_of(itv[j]) for j in TASKS)
mdl.add(mdl.minimize(makespan))
# Solve it, making sure we get the absolute optimal (0 tolerance)
# and limiting the log a bit. 's' contains the solution.
s = mdl.solve(OptimalityTolerance=0, LogVerbosity="Terse")
# Get the final makespan
sol_makespan = s.get_objective_values()[0]
# Print the solution by zone
# s[X] gets the value of unknown X in the solution s
# s[r] gets the value of the state function in the solution
# this is a list of triples (start, end, value) representing
# the full extent of the state function over the whole time line.
zones = s[r]
# Iterate over the zones, ignoring the first and last ones, which
# are the zones before the first and after the last task.
for (start, end, value) in zones[1:-1] :
print("Category is {} in window [{},{})".format(value, start, end))
for j in TASKS:
(istart, iend, ilength) = s[itv[j]] # intervals are start/end/length
if istart >= start and iend <= end:
print("\t{} @ {} -- {} --> {}".format(
itv[j].get_name(), istart, ilength, iend))