docplex.cp.model 集群内存不足 运行
docplex.cp.model out-of-memory running on a cluster
我正在尝试 运行 以下 docplex.cp.model
大型数据集。这是一些示例数据:
import numpy as np
from docplex.cp.model import CpoModel
N = 180000
S = 10
k = 2
u_i = np.random.rand(N)[:,np.newaxis]
u_ij = np.random.rand(N*S).reshape(N, S)
beta = np.random.rand(N)[:,np.newaxis]
m = CpoModel(name = 'model')
R = range(1, S)
idx = [(j) for j in R]
I = m.binary_var_dict(idx)
m.add_constraint(m.sum(I[j] for j in R)<= k)
total_rev = m.sum(beta[i,0] / ( 1 + u_i[i,0]/sum(I[j] * u_ij[i-1,j] for j in R) ) for i in range(N) )
m.maximize(total_rev)
sol=m.solve(agent='local',execfile='/Users/Mine/Python/tf2_4_env/bin/cpoptimizer')
print(sol.get_solver_log())
我已尝试 运行 使用以下设置在集群上执行此操作:
#SBATCH --nodes=2
#SBATCH --ntasks-per-node=4
#SBATCH --mem-per-cpu=4571
这在 out-of-memory
处停止,如输出所示:
! --------------------------------------------------- CP Optimizer 20.1.0.0 --
! Maximization problem - 9 variables, 1 constraint
! Presolve : 360001 extractables eliminated
! Initial process time : 28.95s (28.77s extraction + 0.19s propagation)
! . Log search space : 9.0 (before), 9.0 (after)
! . Memory usage : 623.2 MB (before), 623.2 MB (after)
! Using parallel search with 28 workers.
! ----------------------------------------------------------------------------
! Best Branches Non-fixed W Branch decision
0 9 -
+ New bound is 80920.82
Traceback (most recent call last):
File "sample.py", line 22, in <module>
sol=m.solve(agent='local',execfile='/home/wbs/bstqhc/.local/bin/cpoptimizer') #agent='local',execfile='/Users/Mine/Python/tf2_4_env/bin/cpoptimizer')
File "/home/wbs/bstqhc/.local/lib/python3.7/site-packages/docplex/cp/model.py", line 1222, in solve
msol = solver.solve()
File "/home/wbs/bstqhc/.local/lib/python3.7/site-packages/docplex/cp/solver/solver.py", line 775, in solve
raise e
File "/home/wbs/bstqhc/.local/lib/python3.7/site-packages/docplex/cp/solver/solver.py", line 768, in solve
msol = self.agent.solve()
File "/home/wbs/bstqhc/.local/lib/python3.7/site-packages/docplex/cp/solver/solver_local.py", line 209, in solve
jsol = self._wait_json_result(EVT_SOLVE_RESULT)
File "/home/wbs/bstqhc/.local/lib/python3.7/site-packages/docplex/cp/solver/solver_local.py", line 545, in _wait_json_result
data = self._wait_event(evt)
File "/home/wbs/bstqhc/.local/lib/python3.7/site-packages/docplex/cp/solver/solver_local.py", line 448, in _wait_event
evt, data = self._read_message()
File "/home/wbs/bstqhc/.local/lib/python3.7/site-packages/docplex/cp/solver/solver_local.py", line 604, in _read_message
frame = self._read_frame(6)
File "/home/wbs/bstqhc/.local/lib/python3.7/site-packages/docplex/cp/solver/solver_local.py", line 664, in _read_frame
raise CpoSolverException("Nothing to read from local solver process. Process seems to have been stopped (rc={}).".format(rc))
docplex.cp.solver.solver.CpoSolverException: Nothing to read from local solver process. Process seems to have been stopped (rc=-9).
slurmstepd: error: Detected 2 oom-kill event(s) in step 379869.batch cgroup. Some of your processes may have been killed by the cgroup out-of-memory handler.
我观察到的是优化是 运行ning 并行,正如它所说 Using parallel search with 28 workers
并且每个节点有 28 个核心。但是看起来它只使用了 1 个节点。
你能帮我解决内存不足的问题吗?
默认情况下,CPO 求解器启动的工作线程数与可见内核数一样多,包括超线程。不幸的是,内存消耗几乎与工作人员数量成正比,这解释了您的“内存不足”。
您应该通过在您的解决请求中添加例如 Workers=4 来限制此数量,在您的情况下会变成:
sol=m.solve(agent='local',execfile='/Users/Mine/Python/tf2_4_env/bin/cpoptimizer', Workers=4)
我正在尝试 运行 以下 docplex.cp.model
大型数据集。这是一些示例数据:
import numpy as np
from docplex.cp.model import CpoModel
N = 180000
S = 10
k = 2
u_i = np.random.rand(N)[:,np.newaxis]
u_ij = np.random.rand(N*S).reshape(N, S)
beta = np.random.rand(N)[:,np.newaxis]
m = CpoModel(name = 'model')
R = range(1, S)
idx = [(j) for j in R]
I = m.binary_var_dict(idx)
m.add_constraint(m.sum(I[j] for j in R)<= k)
total_rev = m.sum(beta[i,0] / ( 1 + u_i[i,0]/sum(I[j] * u_ij[i-1,j] for j in R) ) for i in range(N) )
m.maximize(total_rev)
sol=m.solve(agent='local',execfile='/Users/Mine/Python/tf2_4_env/bin/cpoptimizer')
print(sol.get_solver_log())
我已尝试 运行 使用以下设置在集群上执行此操作:
#SBATCH --nodes=2
#SBATCH --ntasks-per-node=4
#SBATCH --mem-per-cpu=4571
这在 out-of-memory
处停止,如输出所示:
! --------------------------------------------------- CP Optimizer 20.1.0.0 --
! Maximization problem - 9 variables, 1 constraint
! Presolve : 360001 extractables eliminated
! Initial process time : 28.95s (28.77s extraction + 0.19s propagation)
! . Log search space : 9.0 (before), 9.0 (after)
! . Memory usage : 623.2 MB (before), 623.2 MB (after)
! Using parallel search with 28 workers.
! ----------------------------------------------------------------------------
! Best Branches Non-fixed W Branch decision
0 9 -
+ New bound is 80920.82
Traceback (most recent call last):
File "sample.py", line 22, in <module>
sol=m.solve(agent='local',execfile='/home/wbs/bstqhc/.local/bin/cpoptimizer') #agent='local',execfile='/Users/Mine/Python/tf2_4_env/bin/cpoptimizer')
File "/home/wbs/bstqhc/.local/lib/python3.7/site-packages/docplex/cp/model.py", line 1222, in solve
msol = solver.solve()
File "/home/wbs/bstqhc/.local/lib/python3.7/site-packages/docplex/cp/solver/solver.py", line 775, in solve
raise e
File "/home/wbs/bstqhc/.local/lib/python3.7/site-packages/docplex/cp/solver/solver.py", line 768, in solve
msol = self.agent.solve()
File "/home/wbs/bstqhc/.local/lib/python3.7/site-packages/docplex/cp/solver/solver_local.py", line 209, in solve
jsol = self._wait_json_result(EVT_SOLVE_RESULT)
File "/home/wbs/bstqhc/.local/lib/python3.7/site-packages/docplex/cp/solver/solver_local.py", line 545, in _wait_json_result
data = self._wait_event(evt)
File "/home/wbs/bstqhc/.local/lib/python3.7/site-packages/docplex/cp/solver/solver_local.py", line 448, in _wait_event
evt, data = self._read_message()
File "/home/wbs/bstqhc/.local/lib/python3.7/site-packages/docplex/cp/solver/solver_local.py", line 604, in _read_message
frame = self._read_frame(6)
File "/home/wbs/bstqhc/.local/lib/python3.7/site-packages/docplex/cp/solver/solver_local.py", line 664, in _read_frame
raise CpoSolverException("Nothing to read from local solver process. Process seems to have been stopped (rc={}).".format(rc))
docplex.cp.solver.solver.CpoSolverException: Nothing to read from local solver process. Process seems to have been stopped (rc=-9).
slurmstepd: error: Detected 2 oom-kill event(s) in step 379869.batch cgroup. Some of your processes may have been killed by the cgroup out-of-memory handler.
我观察到的是优化是 运行ning 并行,正如它所说 Using parallel search with 28 workers
并且每个节点有 28 个核心。但是看起来它只使用了 1 个节点。
你能帮我解决内存不足的问题吗?
默认情况下,CPO 求解器启动的工作线程数与可见内核数一样多,包括超线程。不幸的是,内存消耗几乎与工作人员数量成正比,这解释了您的“内存不足”。
您应该通过在您的解决请求中添加例如 Workers=4 来限制此数量,在您的情况下会变成:
sol=m.solve(agent='local',execfile='/Users/Mine/Python/tf2_4_env/bin/cpoptimizer', Workers=4)