提高 autograd jacobian 的性能

Improve performance of autograd jacobian

我想知道下面的代码如何才能更快。目前,它似乎慢得不合理,我怀疑我可能使用了 autograd API 错误。我期望的输出是 timeline 的每个元素在 f 的雅可比矩阵中求值,我确实得到了,但它需要很长时间:

import numpy as np
from autograd import jacobian


def f(params):
    mu_, log_sigma_ = params
    Z = timeline * mu_ / log_sigma_
    return Z


timeline = np.linspace(1, 100, 40000)

gradient_at_mle = jacobian(f)(np.array([1.0, 1.0]))

我希望得到以下结果:

  1. jacobian(f) returns 表示梯度向量的函数w.r.t。参数。
  2. jacobian(f)(np.array([1.0, 1.0])) 是在点 (1, 1) 计算的雅可比矩阵。对我来说,这应该像一个向量化的 numpy 函数,所以它应该执行得非常快,即使对于 40k 长度的数组也是如此。然而,事实并非如此。

即使像下面这样的东西也有同样糟糕的表现:

import numpy as np
from autograd import jacobian


def f(params, t):
    mu_, log_sigma_ = params
    Z = t * mu_ / log_sigma_
    return Z


timeline = np.linspace(1, 100, 40000)

gradient_at_mle = jacobian(f)(np.array([1.0, 1.0]), timeline)

https://github.com/HIPS/autograd/issues/439 我了解到有一个未记录的函数 autograd.make_jvp 用快进模式计算雅可比矩阵。

link 状态:

Given a function f, vectors x and v in the domain of f, make_jvp(f)(x)(v) computes both f(x) and the Jacobian of f evaluated at x, right multiplied by the vector v.

To get the full Jacobian of f you just need to write a loop to evaluate make_jvp(f)(x)(v) for each v in the standard basis of f's domain. Our reverse mode Jacobian operator works in the same way.

根据你的例子:

import autograd.numpy as np
from autograd import make_jvp

def f(params):
    mu_, log_sigma_ = params
    Z = timeline * mu_ / log_sigma_
    return Z

timeline = np.linspace(1, 100, 40000)

gradient_at_mle = make_jvp(f)(np.array([1.0, 1.0]))

# loop through each basis
# [1, 0] evaluates (f(0), first column of jacobian)
# [0, 1] evaluates (f(0), second column of jacobian)
for basis in (np.array([1, 0]), np.array([0, 1])):
    val_of_f, col_of_jacobian = gradient_at_mle(basis)
    print(col_of_jacobian)

输出:

[  1.           1.00247506   1.00495012 ...  99.99504988  99.99752494
 100.        ]
[  -1.           -1.00247506   -1.00495012 ...  -99.99504988  -99.99752494
 -100.        ]

这在 google collab 上运行约 0.005 秒。

编辑:

cdf 这样的函数还没有为常规 jvp 定义,但是您可以在定义它的地方使用另一个未记录的函数 make_jvp_reversemode。用法类似,只是输出只是列而不是函数的值:

import autograd.numpy as np
from autograd.scipy.stats.norm import cdf
from autograd.differential_operators import make_jvp_reversemode


def f(params):
    mu_, log_sigma_ = params
    Z = timeline * cdf(mu_ / log_sigma_)
    return Z

timeline = np.linspace(1, 100, 40000)

gradient_at_mle = make_jvp_reversemode(f)(np.array([1.0, 1.0]))

# loop through each basis
# [1, 0] evaluates first column of jacobian
# [0, 1] evaluates second column of jacobian
for basis in (np.array([1, 0]), np.array([0, 1])):
    col_of_jacobian = gradient_at_mle(basis)
    print(col_of_jacobian)

输出:

[0.05399097 0.0541246  0.05425823 ... 5.39882939 5.39896302 5.39909665]
[-0.05399097 -0.0541246  -0.05425823 ... -5.39882939 -5.39896302 -5.39909665]

请注意,由于使用了缓存,make_jvp_reversemode 将比 make_jvp 稍微快一个常数。