Optaplanner 基准测试预热 - 内存不足

Optaplanner's benchmark warm up - OutOfMemory

在尝试使用基准配置测试解决方案的求解器时,我遇到了以下异常:

2021-12-22 15:24:37.328  WARN 22684 --- [    Test worker] c.o.b.i.D.singleBenchmarkRunnerException : The warm up singleBenchmarkRunner (Problem_0_Currently used_0) with random seed (null) failed.

java.lang.OutOfMemoryError: Java heap space

2021-12-22 15:24:37.329  WARN 22684 --- [    Test worker] c.o.b.i.D.singleBenchmarkRunnerException : The warm up singleBenchmarkRunner (Problem_0_Currently used_0) with random seed (null) failed.

java.lang.OutOfMemoryError: Java heap space

2021-12-22 15:24:37.330  WARN 22684 --- [    Test worker] c.o.b.i.D.singleBenchmarkRunnerException : The warm up singleBenchmarkRunner (Problem_0_Currently used_0) with random seed (null) failed.

java.lang.OutOfMemoryError: Java heap space
    at java.base/java.lang.Long.valueOf(Long.java:1207)
    at myrostering.solver.PEC.LambdaExtractorEC9F24820AB70C5865CE63ED29F967E9.apply(LambdaExtractorEC9F24820AB70C5865CE63ED29F967E9.java:69)
    at myrostering.solver.PEC.LambdaExtractorEC9F24820AB70C5865CE63ED29F967E9.apply(LambdaExtractorEC9F24820AB70C5865CE63ED29F967E9.java:1)
    at org.drools.model.functions.Function1$Impl.apply(Function1.java:35)
    at org.drools.modelcompiler.constraints.LambdaReadAccessor.getValue(LambdaReadAccessor.java:42)
    at org.drools.core.rule.Declaration.getValue(Declaration.java:258)
    at org.drools.core.rule.Declaration.getValue(Declaration.java:253)
    at org.drools.modelcompiler.constraints.BindingEvaluator.getArgument(BindingEvaluator.java:59)
    at org.drools.modelcompiler.constraints.ConstraintEvaluator$InnerEvaluator.getArgument(ConstraintEvaluator.java:242)
    at org.drools.modelcompiler.constraints.ConstraintEvaluator$InnerEvaluator$_2.evaluate(ConstraintEvaluator.java:309)
    at org.drools.modelcompiler.constraints.ConstraintEvaluator.evaluate(ConstraintEvaluator.java:124)
    at org.drools.modelcompiler.constraints.LambdaConstraint.isAllowedCachedLeft(LambdaConstraint.java:187)
    at org.drools.core.common.SingleBetaConstraints.isAllowedCachedLeft(SingleBetaConstraints.java:132)
    at org.drools.core.phreak.PhreakAccumulateNode.doLeftInserts(PhreakAccumulateNode.java:178)
    at org.drools.core.phreak.PhreakAccumulateNode.doNode(PhreakAccumulateNode.java:89)
    at org.drools.core.phreak.RuleNetworkEvaluator.switchOnDoBetaNode(RuleNetworkEvaluator.java:591)
    at org.drools.core.phreak.RuleNetworkEvaluator.evalBetaNode(RuleNetworkEvaluator.java:558)
    at org.drools.core.phreak.RuleNetworkEvaluator.evalNode(RuleNetworkEvaluator.java:385)
    at org.drools.core.phreak.RuleNetworkEvaluator.innerEval(RuleNetworkEvaluator.java:345)
    at org.drools.core.phreak.RuleNetworkEvaluator.outerEval(RuleNetworkEvaluator.java:181)
    at org.drools.core.phreak.RuleNetworkEvaluator.evaluateNetwork(RuleNetworkEvaluator.java:139)
    at org.drools.core.phreak.RuleExecutor.reEvaluateNetwork(RuleExecutor.java:235)
    at org.drools.core.phreak.RuleExecutor.evaluateNetworkAndFire(RuleExecutor.java:91)
    at org.drools.core.concurrent.AbstractRuleEvaluator.internalEvaluateAndFire(AbstractRuleEvaluator.java:33)
    at org.drools.core.concurrent.SequentialRuleEvaluator.evaluateAndFire(SequentialRuleEvaluator.java:43)
    at org.drools.core.common.DefaultAgenda.fireLoop(DefaultAgenda.java:753)
    at org.drools.core.common.DefaultAgenda.internalFireAllRules(DefaultAgenda.java:700)
    at org.drools.core.common.DefaultAgenda.fireAllRules(DefaultAgenda.java:692)
    at org.drools.core.impl.StatefulKnowledgeSessionImpl.internalFireAllRules(StatefulKnowledgeSessionImpl.java:1225)
    at org.drools.core.impl.StatefulKnowledgeSessionImpl.fireAllRules(StatefulKnowledgeSessionImpl.java:1216)
    at org.drools.core.impl.StatefulKnowledgeSessionImpl.fireAllRules(StatefulKnowledgeSessionImpl.java:1200)
    at org.optaplanner.core.impl.score.director.drools.DroolsScoreDirector.calculateScore(DroolsScoreDirector.java:105)

这里是测试class我运行:

@SpringBootTest(classes = MyApplication.class)
@EnableConfigurationProperties({ApplicationProperties.class, MyRosterProperties.class})
public class SolverBenchmarkTest {

    private PlannerBenchmarkFactory benchmarkFactory = PlannerBenchmarkFactory.createFromXmlResource(
            "myrostering/benchmark/benchmarkSolverConfig.xml");

    @Autowired
    MyRosterGenerator myRosterGenerator;

    @Test
    public void benchmarkBasicRostering() {
        MyRoster mr = myRosterGenerator.createMyRoster();
        PlannerBenchmark benchmark = benchmarkFactory.buildPlannerBenchmark(mr);
        benchmark.benchmarkAndShowReportInBrowser();    
    }

}

这是基准配置文件:

<?xml version="1.0" encoding="UTF-8"?>
<plannerBenchmark xmlns="https://www.optaplanner.org/xsd/benchmark" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
                  xsi:schemaLocation="https://www.optaplanner.org/xsd/benchmark https://www.optaplanner.org/xsd/benchmark/benchmark.xsd">

    <benchmarkDirectory>local/benchmark/data/my-roster</benchmarkDirectory>
    <parallelBenchmarkCount>AUTO</parallelBenchmarkCount>
    <warmUpSecondsSpentLimit>30</warmUpSecondsSpentLimit>

    <inheritedSolverBenchmark>
        <solver>
            <!-- This part of the solver configuration must be the same as the one used by the planner, otherwise, the benchmark test is pointless -->
            <moveThreadCount>4</moveThreadCount>
            <solutionClass>myrostering.domain.MyRoster</solutionClass>
            <entityClass>myrostering.domain.Assignment</entityClass>
            <scoreDirectorFactory>
                <scoreDrl>myrostering/solver/myRosteringScoreRules.drl</scoreDrl>
            </scoreDirectorFactory>
            <termination>
                <!-- Adding this secondsSpentLimit (contrary to no limit set for the planner) to avoid the benchmark running for too long -->
                <secondsSpentLimit>60</secondsSpentLimit>
                <bestScoreLimit>0hard/0medium/0soft</bestScoreLimit>
            </termination>
            <constructionHeuristic>
                <constructionHeuristicType>STRONGEST_FIT</constructionHeuristicType>
            </constructionHeuristic>
        </solver>
    </inheritedSolverBenchmark>

    <solverBenchmark>
        <name>Currently used</name>
        <solver>
            <localSearch>
                <unionMoveSelector>
                    <moveListFactory>
                        <cacheType>PHASE</cacheType>
                        <moveListFactoryClass>
                            myrostering.solver.move.factory.ChangeMoveFactory
                        </moveListFactoryClass>
                    </moveListFactory>
                    <moveListFactory>
                        <cacheType>PHASE</cacheType>
                        <moveListFactoryClass>
                            myrostering.solver.move.factory.SwapMoveFactory
                        </moveListFactoryClass>
                    </moveListFactory>
                </unionMoveSelector>
                <acceptor>
                    <entityTabuSize>5</entityTabuSize>
                    <simulatedAnnealingStartingTemperature>15000hard/10medium/1000soft</simulatedAnnealingStartingTemperature>
                </acceptor>
                <forager>
                    <acceptedCountLimit>4</acceptedCountLimit>
                </forager>
            </localSearch>
        </solver>
    </solverBenchmark>
</plannerBenchmark>

此外,我想补充一点,我们 运行 solver.solve() 没有问题 - 即使数据集非常大(包含解决方案的文件为 150 到 300 Mo序列化时)。所以当基准测试在预热时失败时我有点惊讶...

编辑:

我已经更改了这两个参数的配置:

<parallelBenchmarkCount>1</parallelBenchmarkCount>
...
<secondsSpentLimit>600</secondsSpentLimit>

但我仍然遇到以下异常:

2022-01-03 10:53:49.850  INFO 21696 --- [nchmarkThread-1] o.d.c.kie.builder.impl.KieContainerImpl  : Start creation of KieBase: defaultKieBase
2022-01-03 10:53:49.909  INFO 21696 --- [nchmarkThread-1] o.d.c.kie.builder.impl.KieContainerImpl  : End creation of KieBase: defaultKieBase
2022-01-03 10:54:32.585  INFO 21696 --- [nchmarkThread-1] o.o.core.impl.solver.DefaultSolver       : Solving started: time spent (41506), best score (-38295462hard/38260medium/3640soft), environment mode (REPRODUCIBLE), move thread count (4), random (JDK with seed 0).
2022-01-03 10:54:33.611 ERROR 21696 --- [nchmarkThread-1] o.o.core.impl.solver.thread.ThreadUtils  : Multithreaded Local Search's ExecutorService didn't terminate within timeout (1 seconds).
2022-01-03 10:54:33.611  INFO 21696 --- [nchmarkThread-1] o.o.c.i.h.thread.MoveThreadRunner        : Score calculation speed will be too low because move thread (0)'s destroy wasn't processed soon enough.
2022-01-03 10:54:33.611  INFO 21696 --- [nchmarkThread-1] o.o.c.i.h.thread.MoveThreadRunner        : Score calculation speed will be too low because move thread (1)'s destroy wasn't processed soon enough.
2022-01-03 10:54:33.611  INFO 21696 --- [nchmarkThread-1] o.o.c.i.h.thread.MoveThreadRunner        : Score calculation speed will be too low because move thread (2)'s destroy wasn't processed soon enough.
2022-01-03 10:54:33.611  INFO 21696 --- [nchmarkThread-1] o.o.c.i.h.thread.MoveThreadRunner        : Score calculation speed will be too low because move thread (3)'s destroy wasn't processed soon enough.
2022-01-03 10:54:33.612  INFO 21696 --- [nchmarkThread-1] .c.i.c.DefaultConstructionHeuristicPhase : Construction Heuristic phase (0) ended: time spent (42533), best score (-38295462hard/38260medium/3640soft), score calculation speed (0/sec), step total (0).
2022-01-03 10:56:00.115  WARN 21696 --- [    Test worker] c.o.b.i.D.singleBenchmarkRunnerException : The subSingleBenchmarkRunner (Problem_0_Currently used_0) with random seed (null) failed.

java.lang.OutOfMemoryError: Java heap space
    at java.base/java.util.Arrays.copyOf(Arrays.java:3480)
    at java.base/java.util.ArrayList.grow(ArrayList.java:237)
    at java.base/java.util.ArrayList.grow(ArrayList.java:244)
    at java.base/java.util.ArrayList.add(ArrayList.java:454)
    at java.base/java.util.ArrayList.add(ArrayList.java:467)
    at be.myrostering.solver.move.factory.MySwapMoveFactory.createMoveList(MySwapMoveFactory.java:50)
    at be.myrostering.solver.move.factory.MySwapMoveFactory.createMoveList(MySwapMoveFactory.java:30)
    at org.optaplanner.core.impl.heuristic.selector.move.factory.MoveListFactoryToMoveSelectorBridge.constructCache(MoveListFactoryToMoveSelectorBridge.java:72)
    at org.optaplanner.core.impl.heuristic.selector.common.SelectionCacheLifecycleBridge.phaseStarted(SelectionCacheLifecycleBridge.java:51)
    at org.optaplanner.core.impl.phase.event.PhaseLifecycleSupport.firePhaseStarted(PhaseLifecycleSupport.java:37)
    at org.optaplanner.core.impl.heuristic.selector.AbstractSelector.phaseStarted(AbstractSelector.java:50)
    at org.optaplanner.core.impl.phase.event.PhaseLifecycleSupport.firePhaseStarted(PhaseLifecycleSupport.java:37)
    at org.optaplanner.core.impl.heuristic.selector.AbstractSelector.phaseStarted(AbstractSelector.java:50)
    at org.optaplanner.core.impl.localsearch.decider.LocalSearchDecider.phaseStarted(LocalSearchDecider.java:94)
    at org.optaplanner.core.impl.localsearch.decider.MultiThreadedLocalSearchDecider.phaseStarted(MultiThreadedLocalSearchDecider.java:92)
    at org.optaplanner.core.impl.localsearch.DefaultLocalSearchPhase.phaseStarted(DefaultLocalSearchPhase.java:141)
    at org.optaplanner.core.impl.localsearch.DefaultLocalSearchPhase.solve(DefaultLocalSearchPhase.java:82)
    at org.optaplanner.core.impl.solver.AbstractSolver.runPhases(AbstractSolver.java:99)
    at org.optaplanner.core.impl.solver.DefaultSolver.solve(DefaultSolver.java:192)
    at org.optaplanner.benchmark.impl.SubSingleBenchmarkRunner.call(SubSingleBenchmarkRunner.java:122)
    at org.optaplanner.benchmark.impl.SubSingleBenchmarkRunner.call(SubSingleBenchmarkRunner.java:42)
    at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
    at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130)
    at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:630)
    at java.base/java.lang.Thread.run(Thread.java:832)

2022-01-03 10:56:00.603  INFO 21696 --- [    Test worker] o.o.b.impl.report.BenchmarkReport        : Generating benchmark report...

VERSION_2_3_31
java.lang.NoSuchFieldError: VERSION_2_3_31
    at org.optaplanner.benchmark.impl.report.BenchmarkReport.writeHtmlOverviewFile(BenchmarkReport.java:828)
    at org.optaplanner.benchmark.impl.report.BenchmarkReport.writeReport(BenchmarkReport.java:318)
    at org.optaplanner.benchmark.impl.DefaultPlannerBenchmark.benchmarkingEnded(DefaultPlannerBenchmark.java:311)
    at org.optaplanner.benchmark.impl.DefaultPlannerBenchmark.benchmark(DefaultPlannerBenchmark.java:100)
    at org.optaplanner.benchmark.impl.DefaultPlannerBenchmark.benchmarkAndShowReportInBrowser(DefaultPlannerBenchmark.java:424)

最后一点,问题似乎与 Optaplanner 无关(因为内存不足是在 MySwapMoveFactory 中触发的)- 如果是这样,我将关闭此 post。但是,当 运行 求解器而不是基准时它起作用仍然很奇怪......

增加内存,例如使用 VM 选项 -Xmx4g

另请注意,parallelBenchmarkCount AUTO 当前并未考虑 moveThreadCount 不是 NONE。因此,您的基准测试将不准确,因为如果您有 16 个内核,parallelBenchmarkCount AUTO 将解析为 8。使用 moveThreadCount 4(+ 1 个求解器线程),您将使用 32 个以上的内核,但只有 16 个内核。这可能应该被报告为 optaplanner 的 jira for parallelBenchmarkCount AUTO 中的一个问题。

MoveListFactory 扩展性差,消耗大量内存和 cpu。

某种动作有数十亿种可能的动作。例如,10000 个班次的 3 交换有 1 万亿次移动。这不适合几 GB 的 RAM,并且需要很长时间才能生成。

改用 MoveIteratorFactory 并且不生成移动列表,而是像默认选择器那样生成 及时。请参阅文档。