Zygote.Grads 的线性组合

Question

我正在使用 Flux 构建和训练神经网络模型，我想知道是否有一种方法可以采用 Zygote.Grads 类型的线性组合。

这是一个简单的例子。通常是这样完成的：

m = hcat(2.0); b = hcat(-1.0);  # random 1 x 1 matrices

f(x) = m*x .+ b
ps = Flux.params(m, b)  # parameters to be adjusted

inputs = [0.3 1.5]  # random 1 x 2 matrix

loss(x) = sum( f(x).^2 )

gs = Flux.gradient(() -> loss(inputs), ps)  # the typical way
@show gs[m], gs[b]  # 5.76, 3.2

但我想通过在更深层次上计算梯度，然后在最后组装它来做同样的计算。例如：

input1 = hcat(inputs[1, 1]); input2 = hcat(inputs[1, 2]);  # turn each input into a 1 x 1 matrix

grad1 = Flux.gradient(() -> f(input1)[1], ps)  # df/dp using input1 (where p is m or b)
grad2 = Flux.gradient(() -> f(input2)[1], ps)  # df/dp using input2 (where p is m or b)

predicted1 = f(input1)[1]
predicted2 = f(input2)[1]

myGrad_m = (2 * predicted1 * grad1[m]) + (2 * predicted2 * grad2[m])  # 5.76
myGrad_b = (2 * predicted1 * grad1[b]) + (2 * predicted2 * grad2[b])  # 3.2

上面我用链式法则和导数的线性分解了loss()函数的梯度：

d(loss)/dp = d( sum(f^2) ) / dp = sum( d(f^2)/dp ) = sum( 2*f * df/dp )

然后，我用Zygote.gradient计算了df/dp，最后合并了结果。

但请注意，我必须分别组合 m 和 b。这很好，因为只有 2 个参数。

但是，如果有 1000 个参数，我想做这样的事情，它是 Zygote.Grads:

的线性组合

myGrad = (2 * predicted1 * grad1) + (2 * predicted2 * grad2)

但是，我收到一条错误消息，指出未为这些类型定义 + 和 * 运算符。我怎样才能让这个快捷方式起作用？

Answer 1

只需将每个 */+ 转换为 .*/.+（即使用广播），或者您可以使用 map 将函数应用到多个 Grads 一次。这在 Zygote 文档 here 中有描述。请注意，为了使其工作，所有 Grads 必须共享相同的键（因此它们必须对应于相同的参数）。

Zygote.Grads 的线性组合

Linear combinations of Zygote.Grads

julia

flux.jl