moveThreadCount=1 的 Optaplanner 配置与没有 moveThreadCount 不同

Optaplanner config with moveThreadCount=1 not same as no moveThreadCount

我已经升级到 Optaplanner 7.12,并且在寻找多线程与 VariableListeners 混合的潜在问题时,注意到可重复执行中的一个奇怪之处:如果配置文件有 <moveThreadCount>1</moveThreadCount>,执行与当 moveThreadCount 行不存在时,作为用户,这对我来说似乎出乎意料,并且可能与我看到的潜在 optaplanner 竞争条件交织在一起(在本文末尾注明 post)。

代码详情

我在具有固定种子的 REPRODUCIBLE 模式下观察到这个配置文件: <environmentMode>REPRODUCIBLE</environmentMode> <randomSeed>50</randomSeed>

在我使用 VariableListener 的过程中可以看到 Optaplanner 行为的差异。我有一组自定义 MoveFactory classes 用于护士排班派生模型。每个自定义工厂为每个 STEP 生成一组不同的移动,并且每个工厂都根据一组公共的状态相关的计算密集型预计算来决定他们的移动。我创建了一个 MoveFactoryHelper class 来进行预计算,然后在每个自定义 MoveFactory 的 createMoveList 方法的开头调用助手(我还没有尝试迁移到较新的 Optaplanner 迭代移动生成选项)。

为避免对每个移动工厂重复计算,MoveFactoryHelper 存储其结果以供重复使用,并根据注册的 VariableListener 中设置的 'dirty' 标志决定何时重新计算在模型的 PlanningEntity 的(完全未使用的)阴影上,并在重新计算发生时由 MoveFactoryHelper 清除:

ShiftAssignment.java

    @PlanningEntity(movableEntitySelectionFilter = MovableShiftAssignmentSelectionFilter.class,
        difficultyComparatorClass = ShiftAssignmentDifficultyComparator.class)
    @XStreamAlias("ShiftAssignment")
    public class ShiftAssignment extends AbstractPersistable {

        ...

        @PlanningVariable(valueRangeProviderRefs = {"employeeRange"},
            strengthComparatorClass = EmployeeStrengthComparator.class
            )
        private Employee employee;

        ...

        @CustomShadowVariable( variableListenerClass=UpdatingEmployeeVariableListener.class, 
            sources={@PlanningVariableReference(variableName="employee", entityClass=ShiftAssignment.class)})
        private Employee notifierEmployee;  // TODO is there a better way to notify move factory of changes in problem facts?

UpdatingEmployeeVariableListener.java

    private static final Logger logger = LoggerFactory.getLogger(UpdatingEmployeeVariableListener.class);

    private static final boolean initiallyDirty = true;
    private static Map<Thread, Boolean> employeeShiftAssignmentEntityDirty = new HashMap<Thread, Boolean>();
    private static Map<Thread, Boolean> employeeShiftAssignmentMapDirty = new HashMap<Thread, Boolean>();

    private static final boolean useThreadFlags = false;

    // debug monitoring
    private static Map<Thread, Integer> countDirtyAllFlags = new HashMap<Thread, Integer>();
    private static Map<Thread, Integer> countBeforeEntityAdded = new HashMap<Thread, Integer>();
    private static Map<Thread, Integer> countAfterEntityAdded = new HashMap<Thread, Integer>();
    private static Map<Thread, Integer> countBeforeVariableChanged = new HashMap<Thread, Integer>();
    private static Map<Thread, Integer> countAfterVariableChanged = new HashMap<Thread, Integer>();
    private static Map<Thread, Integer> countBeforeEntityRemoved = new HashMap<Thread, Integer>();
    private static Map<Thread, Integer> countAfterEntityRemoved = new HashMap<Thread, Integer>();

    public UpdatingEmployeeVariableListener() {
        // no action
    }

    private static Thread getActiveThread() {
        return useThreadFlags ? Thread.currentThread() : null;
    }

    public static void setFlagsDirty() {
        countDirtyAllFlags.put(getActiveThread(), 1+countDirtyAllFlags.getOrDefault(getActiveThread(), 0));
        employeeShiftAssignmentEntityDirty.put(getActiveThread(), true);
        employeeShiftAssignmentMapDirty.put(getActiveThread(), true);
    }

    @Override
    public void beforeEntityAdded(@SuppressWarnings("rawtypes") ScoreDirector scoreDirector, ShiftAssignment entity) {
        countBeforeEntityAdded.put(getActiveThread(), 1+countBeforeEntityAdded.getOrDefault(getActiveThread(), 0));
        employeeShiftAssignmentMapDirty.put(getActiveThread(), true);
    }

    @Override
    public void afterEntityAdded(@SuppressWarnings("rawtypes") ScoreDirector scoreDirector, ShiftAssignment entity) {
        countAfterEntityAdded.put(getActiveThread(), 1+countAfterEntityAdded.getOrDefault(getActiveThread(), 0));
        employeeShiftAssignmentMapDirty.put(getActiveThread(), true);
    }

    @Override
    public void beforeVariableChanged(@SuppressWarnings("rawtypes") ScoreDirector scoreDirector,
            ShiftAssignment entity) {
        countBeforeVariableChanged.put(getActiveThread(), 1+countBeforeVariableChanged.getOrDefault(getActiveThread(), 0));
        employeeShiftAssignmentMapDirty.put(getActiveThread(), true);
    }

    @Override
    public void afterVariableChanged(@SuppressWarnings("rawtypes") ScoreDirector scoreDirector,
            ShiftAssignment entity) {
        countAfterVariableChanged.put(getActiveThread(), 1+countAfterVariableChanged.getOrDefault(getActiveThread(), 0));
        employeeShiftAssignmentMapDirty.put(getActiveThread(), true);
    }

    @Override
    public void beforeEntityRemoved(@SuppressWarnings("rawtypes") ScoreDirector scoreDirector, ShiftAssignment entity) {
        countBeforeEntityRemoved.put(getActiveThread(), 1+countBeforeEntityRemoved.getOrDefault(getActiveThread(), 0));
        employeeShiftAssignmentMapDirty.put(getActiveThread(), true);
    }

    @Override
    public void afterEntityRemoved(@SuppressWarnings("rawtypes") ScoreDirector scoreDirector, ShiftAssignment entity) {
        countAfterEntityRemoved.put(getActiveThread(), 1+countAfterEntityRemoved.getOrDefault(getActiveThread(), 0));
        employeeShiftAssignmentMapDirty.put(getActiveThread(), true);
    }

    /**
     * @return the employeeShiftAssignmentEntityDirty
     */
    public static boolean isEmployeeShiftAssignmentEntityDirty() {
        return employeeShiftAssignmentEntityDirty.getOrDefault(getActiveThread(), initiallyDirty);
    }

    /**
     * clears isEntityDirty, implying that the (externally maintained) employee shift assignment entity list has been updated 
     */
    public static void clearEmployeeShiftAssignmentEntityDirty() {
        employeeShiftAssignmentEntityDirty.put(getActiveThread(), false);       
    }

    /**
     * @return the mapDirty (which is depending also on entityDirty)
     */
    public static boolean isEmployeeShiftAssignmentMapDirty() {
        return employeeShiftAssignmentMapDirty.getOrDefault(getActiveThread(), initiallyDirty) || isEmployeeShiftAssignmentEntityDirty();
    }

    /**
     * clears isMapDirty, implying that the (externally maintained) employee shift assignment map has been updated (as well as the underlying entity) 
     */
    public static void clearEmployeeShiftAssignmentMapDirty() {
        clearEmployeeShiftAssignmentEntityDirty();
        employeeShiftAssignmentMapDirty.put(getActiveThread(), false);
        logger.debug("Clearing dirty flag: (AF={}, BEA={}, AEA={}, BVC={}, AVC={}, BER={}, AER={}) thread={}, employeeShiftAssignmentEntityDirty={}, employeeShiftAssignmentMapDirty={}", 
                countDirtyAllFlags.getOrDefault(getActiveThread(), 0),
                countBeforeEntityAdded.getOrDefault(getActiveThread(), 0),
                countAfterEntityAdded.getOrDefault(getActiveThread(), 0),
                countBeforeVariableChanged.getOrDefault(getActiveThread(), 0),
                countAfterVariableChanged.getOrDefault(getActiveThread(), 0),
                countBeforeEntityRemoved.getOrDefault(getActiveThread(), 0),
                countAfterEntityRemoved.getOrDefault(getActiveThread(), 0),
                getActiveThread(),
                employeeShiftAssignmentEntityDirty, 
                employeeShiftAssignmentMapDirty);
        clearCounts();
    }

    private static void clearCounts() {
        countDirtyAllFlags.put(getActiveThread(), 0);
        countBeforeEntityAdded.put(getActiveThread(), 0);
        countAfterEntityAdded.put(getActiveThread(), 0);
        countBeforeVariableChanged.put(getActiveThread(), 0);
        countAfterVariableChanged.put(getActiveThread(), 0);
        countBeforeEntityRemoved.put(getActiveThread(), 0);
        countAfterEntityRemoved.put(getActiveThread(), 0);
    }
}

(请注意,这里的布尔值映射和整数映射实际上是单个布尔值和整数,因为由于最终 useThreadFlags=false)

我确认只有 MoveFactory 对象调用 MoveFactoryHelper。同样,除了上面的 VariableListener 注释和来自 MoveFactoryHelper 的标志 queries/clear 之外,对 UpdatingEmployeeVariableListener 的唯一其他调用是在求解开始之前调用 UpdatingEmployeeVariableListener.setFlagsDirty()

        @Override
        public void actionPerformed(ActionEvent e) {
            UpdatingEmployeeVariableListener.setFlagsDirty();
            setSolvingState(true);
            Solution_ problem = solutionBusiness.getSolution();
            new SolveWorker(problem).execute();
        }

并且在求解停止之后:

    solver.terminateEarly();
    UpdatingEmployeeVariableListener.setFlagsDirty();

maps-by-thread 的模板是新的,但布尔标志的底层使用已成功执行多年:

  1. 由于 optaplanner 对规划实体的 beforeVariableChanged 和 afterVariableChanged 调用,标志被设置为脏
  2. 第一个 MoveFactory 调用 MoveFactoryHelper,它调用 UpdatingEmployeeVariableListener.isEmployeeShiftAssignmentMapDirty() 结果为 true。 MoveFactoryHelper 根据当前状态重新计算,然后调用清除脏标志
  3. MoveFactory 对象的其余部分调用 MoveFactoryHelper,它在 is...Dirty() 查询中发现错误,因此可以重新使用其计算。
  4. Optaplanner 测试了许多候选动作,这再次弄脏了标志,并在为该步骤选择了动作之后,在下一步的早期再次调用 MoveFactory.createMoveList 方法,重复循环。

显示奇怪的 Optaplanner 行为的日志详细信息

随着升级到 7.12,并且没有 moveThreadCount 配置行,当我没有定义 moveThreadCount xml 元素时,代码继续 运行 正确且可重现:

11:20:37.274 INFO  Solving started: time spent (422), best score (0hard/-5340soft), environment mode (REPRODUCIBLE), random (JDK with seed 50).
11:20:37.280 DEBUG     CH step (0), time spent (428), score (0hard/-5340soft), selected move count (1), picked move ((NullEmployee-nochange) 2018-12-25/D/0 {...}).
11:20:37.280 INFO  Construction Heuristic phase (0) ended: time spent (428), best score (0hard/-5340soft), score calculation speed (1000/sec), step total (1).

11:20:37.561 DEBUG Clearing dirty flag: (AF=1, BEA=0, AEA=0, BVC=0, AVC=0, BER=0, AER=0) thread=null, employeeShiftAssignmentEntityDirty={null=false}, employeeShiftAssignmentMapDirty={null=false}
11:20:44.303 DEBUG     LS step (0), time spent (7451), score (0hard/-4919soft), new best score (0hard/-4919soft), accepted/selected move count (1/300), picked move ([(WeekAlign-f) {...}, (WeekAlign-f) {...}]).
11:20:44.310 DEBUG Factories(10) STEP moves: 1594020

11:20:44.312 DEBUG Clearing dirty flag: (AF=0, BEA=0, AEA=0, BVC=13800, AVC=13800, BER=0, AER=0) thread=null, employeeShiftAssignmentEntityDirty={null=false}, employeeShiftAssignmentMapDirty={null=false}
11:20:46.609 DEBUG     LS step (1), time spent (9757), score (0hard/-5266soft),     best score (0hard/-4919soft), accepted/selected move count (1/24), picked move ((SlidePair) 2019-06-04/1/0... 1 shifts {...} <-slide-> {...} 3 shifts ...2019-06-07/1/0).
11:20:46.610 DEBUG Factories(10) STEP moves: 473969

11:20:46.613 DEBUG Clearing dirty flag: (AF=0, BEA=0, AEA=0, BVC=746, AVC=746, BER=0, AER=0) thread=null, employeeShiftAssignmentEntityDirty={null=false}, employeeShiftAssignmentMapDirty={null=false}
11:20:48.124 DEBUG     LS step (2), time spent (11272), score (0hard/-5083soft),     best score (0hard/-4919soft), accepted/selected move count (1/110), picked move ((CloseSlack-newEmplS) 2019-05-28/D/2(7 shifts) <-swap-> {...} 2019-05-21/D/3(7 shifts)).
11:20:48.124 DEBUG Factories(10) STEP moves: 477083

(每一步后的 Factories 调试日志行只是为了显示 10 个自定义工厂在上一步中向求解器提供了多少移动)

但是,当我将 <moveThreadCount>1</moveThreadCount> 行添加到配置文件时,我看到在 Optaplanner 中间进行间歇性调用以重建 MoveFactoryHelper 进行变量更改(请参阅下面的 LS 第 2 步):

10:46:05.413 INFO  Solving started: time spent (360), best score (0hard/-5340soft), environment mode (REPRODUCIBLE), random (JDK with seed 50).
10:46:05.746 DEBUG     CH step (0), time spent (693), score (0hard/-5340soft), selected move count (1), picked move ((NullEmployee-nochange) 2018-12-25/D/0 {...}).
10:46:05.746 INFO  Construction Heuristic phase (0) ended: time spent (693), best score (0hard/-5340soft), score calculation speed (9/sec), step total (1).

10:46:05.949 DEBUG Clearing dirty flag: (AF=1, BEA=0, AEA=0, BVC=0, AVC=0, BER=0, AER=0) thread=null, employeeShiftAssignmentEntityDirty={null=false}, employeeShiftAssignmentMapDirty={null=false}
10:46:13.014 DEBUG     LS step (0), time spent (7961), score (0hard/-4919soft), new best score (0hard/-4919soft), accepted/selected move count (1/300), picked move ([(WeekAlign-f) {...}, (WeekAlign-f) {...}]).
10:46:13.019 DEBUG Factories(10) STEP moves: 1594020

10:46:13.021 DEBUG Clearing dirty flag: (AF=0, BEA=0, AEA=0, BVC=13844, AVC=13844, BER=0, AER=0) thread=null, employeeShiftAssignmentEntityDirty={null=false}, employeeShiftAssignmentMapDirty={null=false}
10:46:14.741 DEBUG     LS step (1), time spent (9688), score (0hard/-5266soft),     best score (0hard/-4919soft), accepted/selected move count (1/19), picked move ((SlidePair) 2019-06-04/1/0... 1 shifts {...} <-slide-> {...} 3 shifts ...2019-06-07/1/0).
10:46:14.741 DEBUG Factories(10) STEP moves: 473969

10:46:14.743 DEBUG Clearing dirty flag: (AF=0, BEA=0, AEA=0, BVC=582, AVC=582, BER=0, AER=0) thread=null, employeeShiftAssignmentEntityDirty={null=false}, employeeShiftAssignmentMapDirty={null=false}
10:46:14.743 DEBUG Clearing dirty flag: (AF=0, BEA=0, AEA=0, BVC=20, AVC=20, BER=0, AER=0) thread=null, employeeShiftAssignmentEntityDirty={null=false}, employeeShiftAssignmentMapDirty={null=false}
10:46:16.444 DEBUG     LS step (2), time spent (11391), score (0hard/-5083soft),     best score (0hard/-4919soft), accepted/selected move count (1/97), picked move ((CloseSlack-newEmplS) {...} 2019-05-28/D/2(7 shifts) <-swap-> {...} 2019-05-21/D/3(7 shifts)).
10:46:16.445 DEBUG Factories(10) STEP moves: 1580032

两条评论: 首先,可重复执行有一些损失,例如注意最初有 13800 before/after 变量更改,现在有 13844。我假设这与多线程的 "enabling" 有关,即使只使用一个线程。

其次,"split"的变量变化的数量和细节,其中看到两次清除脏标志的调用(在重建MoveFactoryHelper之后)与运行 [=117]有一些变化=],导致我认为这是一个多线程竞争问题,例如:

12:16:27.712 INFO  Solving started: time spent (375), best score (0hard/-5340soft), environment mode (REPRODUCIBLE), random (JDK with seed 50).
12:16:28.043 DEBUG     CH step (0), time spent (706), score (0hard/-5340soft), selected move count (1), picked move ((NullEmployee-nochange) 2018-12-25/D/0 {...}).
12:16:28.043 INFO  Construction Heuristic phase (0) ended: time spent (706), best score (0hard/-5340soft), score calculation speed (9/sec), step total (1).

12:16:28.288 DEBUG Clearing dirty flag: (AF=1, BEA=0, AEA=0, BVC=0, AVC=0, BER=0, AER=0) thread=null, employeeShiftAssignmentEntityDirty={null=false}, employeeShiftAssignmentMapDirty={null=false}
12:16:35.148 DEBUG     LS step (0), time spent (7811), score (0hard/-4919soft), new best score (0hard/-4919soft), accepted/selected move count (1/300), picked move ([(WeekAlign-f) {...}, (WeekAlign-f) {...}]).
12:16:35.158 DEBUG Factories(10) STEP moves: 1594020

12:16:35.160 DEBUG Clearing dirty flag: (AF=0, BEA=0, AEA=0, BVC=13821, AVC=13821, BER=0, AER=0) thread=null, employeeShiftAssignmentEntityDirty={null=false}, employeeShiftAssignmentMapDirty={null=false}
12:16:35.160 DEBUG Clearing dirty flag: (AF=0, BEA=0, AEA=0, BVC=0, AVC=0, BER=0, AER=0) thread=null, employeeShiftAssignmentEntityDirty={null=false}, employeeShiftAssignmentMapDirty={null=false}
12:16:37.050 DEBUG     LS step (1), time spent (9713), score (0hard/-5266soft),     best score (0hard/-4919soft), accepted/selected move count (1/22), picked move ((SlidePair) 2019-06-04/1/0... 1 shifts {...} <-slide-> {...} 3 shifts ...2019-06-07/1/0).
12:16:37.053 DEBUG Factories(10) STEP moves: 1576812

12:16:37.054 DEBUG Clearing dirty flag: (AF=0, BEA=0, AEA=0, BVC=763, AVC=763, BER=0, AER=0) thread=null, employeeShiftAssignmentEntityDirty={null=false}, employeeShiftAssignmentMapDirty={null=false}
12:16:37.055 DEBUG Clearing dirty flag: (AF=0, BEA=0, AEA=0, BVC=23, AVC=23, BER=0, AER=0) thread=null, employeeShiftAssignmentEntityDirty={null=false}, employeeShiftAssignmentMapDirty={null=false}
12:16:39.414 DEBUG     LS step (2), time spent (12077), score (0hard/-5083soft),     best score (0hard/-4919soft), accepted/selected move count (1/98), picked move ((CloseSlack-newEmplS) {...} 2019-05-28/D/2(7 shifts) <-swap-> {...} 2019-05-21/D/3(7 shifts)).
12:16:39.414 DEBUG Factories(10) STEP moves: 1580534

因此我的两个问题:

  1. Optaplanner 在没有 moveThreadCount 定义和定义为 1 的情况下表现不一样吗? 这对用户来说似乎出乎意料。

  2. 我或 Optaplanner 可能会做的是导致在上一个 Optaplanner 的所有变量更改完成之前提前调用自定义 MoveFactory(以生成 Move 列表)步骤,即使在单线程配置中? 我想知道 "chosen move" 是否已实现,并且新的 createMoveList 调用是否在来自 [=102= 的最后一个 move-scoring/testing 线程之前开始】 招式全部暂停。即使是这样,我也不知道为什么这会导致这里的执行不可重现,除非仍然 运行ning 线程正在进行随机移动选择(这似乎会产生不可重现的整体执行)。

这在 "run" 和 "debug" 执行环境中都会发生。

谢谢。

似乎已由 Optaplanner 7.15.0 解决。更新到此版本将解决该问题。