如何加速 Ipopt 求解器?

How to speed-up the Ipopt solver?

我想用 cyipopt 解决以下(松弛,即 v(t) ∈ [0, 1])最优控制问题:

这是我目前解决离散化问题的方法:

import numpy as np
import matplotlib.pyplot as plt
from cyipopt import minimize_ipopt
from scipy.optimize._numdiff import approx_derivative

# z = (x1(t0) .... x1(tN) x2(t0) .... x2(tN) v(t0) .... v(tN))^T

def objective(z, time):
    x0, x1, v = np.split(z, 3)
    res = 0.0
    for i in range(time.size-1):
        h = time[i+1] - time[i]
        res += h*((x0[i]-1)**2 + (x1[i]-1)**2)
    return res


def ode_rhs(t, x, v):
    x0, x1 = x
    xdot1 = x0 - x0*x1 - 0.4*x0*v
    xdot2 = -x1 + x0*x1 - 0.2*x1*v
    return np.array([xdot1, xdot2])


def constraint(z, time):
    x0, x1, v = np.split(z, 3)
    x = np.array([x0, x1])
    res = np.zeros((2, x0.size))

    # initial values
    res[:, 0] = x[:, 0] - np.array([0.5, 0.7])

    # 'solve' the ode-system
    for j in range(time.size-1):
        h = time[j+1] - time[j]
        # implicite euler scheme
        res[:, j+1] = x[:, j+1] - x[:, j] - h*ode_rhs(time[j+1], x[:, j+1], v[j])
    return res.flatten()


# time grid
tspan = [0, 12]
dt    = 0.1
time  = np.arange(tspan[0], tspan[1] + dt, dt)

# initial point
z0 = 0.1 + np.zeros(time.size*3)

# variable bounds
bnds = [(None, None) if i < 2*time.size else (0, 1) for i in range(z0.size)]

# constraints:
cons = [{
    'type': 'eq', 
    'fun': lambda z: constraint(z, time), 
    'jac': lambda z: approx_derivative(lambda zz: constraint(zz, time), z)
}]

# call the solver
res = minimize_ipopt(lambda z: objective(z, time), x0=z0, bounds=bnds, 
                     constraints=cons, options = {'disp': 5})

代码按预期运行。但是,它运行起来很慢。关于如何加速求解器的任何想法?

通过分析Ipopt的输出

Total CPU secs in IPOPT (w/o function evaluations)   =     30.153
Total CPU secs in NLP function evaluations           =    203.782

我们可以看到你的功能评估是瓶颈。因此,让我们尝试按照 Tom 在评论中建议的那样分析您的代码:

In [2]: %timeit objective(z0, time)
307 µs ± 6.96 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

In [3]: %timeit constraint(z0, time)
1.38 ms ± 4.77 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

还好,不错。但我们可以做得更好。根据经验,尽可能避免数字 python 代码中的循环。您可以找到一些最佳 numpy 实践,例如Jake VanderPlas 真棒 talk at the PyCon2015。您的 objective 相当于:

def objective(z, time):
    x0, x1, v = np.split(z, 3)
    h = time[1:] - time[:-1]
    return np.sum(h*((x0[1:]-1)**2 + (x1[1:]-1)**2))

同样,您可以删除 constraint 函数中的循环。注意

# 'solve' the ode-system
for j in range(time.size-1):
    h = time[j+1] - time[j]
    # implicite euler scheme
    res[:, j+1] = x[:, j+1] - x[:, j] - h*ode_rhs(time[j+1], x[:, j+1], v[j])

相同
h = time[1:] - time[:-1]
res[:, 1:] = x[:, 1:] - x[:, :-1] - h * ode_rhs(time, x[:, 1:], v[:-1])

再次对函数计时,我们得到

In [4]: %timeit objective(z0, time)
31.8 µs ± 683 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

In [5]: %timeit constraint(z0, time)
54.1 µs ± 647 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

即加速因子 10 倍和 25 倍!因此,我们可以显着减少求解器运行时间:

Total CPU secs in IPOPT (w/o function evaluations)   =     30.906
Total CPU secs in NLP function evaluations           =     46.950

但是,请注意,通过有限差分以数值方式计算梯度和雅可比矩阵的计算成本仍然很高,并且容易出现舍入误差:

In [6]: %timeit approx_derivative(lambda zz: objective(zz, time), z0)
232 ms ± 3.16 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [7]: %timeit approx_derivative(lambda zz: constraint(zz, time), z0)
642 ms ± 1.13 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

相反,我们可以更进一步,利用 jax 库通过算法微分 (AD) 计算两者:

from jax.config import config
# enable 64 bit floating point precision
config.update("jax_enable_x64", True)

import jax.numpy as np
from jax import grad, jacfwd, jit

那么,我们只需要将constraint函数改成如下:

def constraint(z, time):
    x0, x1, v = np.split(z, 3)
    x = np.array([x0, x1])
    res = np.zeros((2, x0.size))

    # initial values
    res = res.at[:, 0].set(x[:, 0] - np.array([0.5, 0.7]))
    h = time[1:] - time[:-1]
    res = res.at[:, 1:].set(x[:, 1:] - x[:, :-1] - h*ode_jit(time[1:], x[:, 1:], v[:-1]))
    return res.flatten()

由于不支持项目分配,请参阅 here。接下来,我们即时 (jit) 编译函数:

# jit the functions
ode_jit = jit(ode_rhs)
obj_jit = jit(lambda z: objective(z, time))
con_jit = jit(lambda z: constraint(z, time))

# Build and jit the derivatives
obj_grad = jit(grad(obj_jit))   # objective gradient
con_jac = jit(jacfwd(con_jit))  # constraint jacobian

# Dummy first call in order to compile the functions
print("Compiling the functions...")
_ = obj_jit(z0), con_jit(z0), obj_grad(z0), con_jac(z0)
print("Done.")

再次计时,我们得到

In [10]: %timeit obj_grad(z0)
62.1 µs ± 353 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

In [11]: %timeit con_jac(z0)
204 µs ± 1.37 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

即加速因子 3740x 和 2260x。最后,我们可以传递精确的梯度和jacobians:

# constraints:
cons = [{'type': 'eq', 'fun': con_jit, 'jac': con_jac}]

# call the solver
res = minimize_ipopt(obj_jit, x0=z0, jac=obj_grad, bounds=bnds,
                         constraints=cons, options={'disp': 5})

并获得

Total CPU secs in IPOPT (w/o function evaluations)   =     35.348
Total CPU secs in NLP function evaluations           =      1.691