如何为 softmax 编写 JAX 自定义向量-雅可比积 (vjp)
How to write a JAX custom vector-Jacobian product (vjp) for softmax
为了理解 JAX 的反向模式自动差异,我尝试为 softmax 编写一个 custom_vjp,如下所示:
import jax
import jax.numpy as jnp
import numpy as np
@jax.custom_vjp
def stablesoftmax(x):
print(f"input: {x} shape: {x.shape}")
expc = jnp.exp(x - jnp.amax(x))
return expc / jnp.sum(expc)
def ssm_fwd(x):
s = stablesoftmax(x)
return s, s
def ssm_bwd(acts, d_dacts):
dacts_dinput = jnp.diag(acts) - jnp.outer(acts, acts) # Jacobian
d_dinput = jnp.dot(d_dacts, dacts_dinput) # Vector-Jacobian product
print(f"Saved activations:\n{acts} shape: {acts.shape}")
print(f"d/d_acts:\n{d_dacts} shape: {d_dacts.shape}")
print(f"d_acts/d_input (Jacobian of softmax):\n{dacts_dinput} shape: {dacts_dinput.shape}")
print(f"d/d_input:\n{d_dinput} shape: {d_dinput.shape}")
return d_dinput
stablesoftmax.defvjp(ssm_fwd, ssm_bwd)
print(f"JAX version: {jax.__version__}")
y = np.array([1., 2., 3.])
a = stablesoftmax(y)
softmax_jac_fun = jax.jacrev(stablesoftmax)
dsoftmax_dy = softmax_jac_fun(y)
print(f"Softmax Jacobian: {dsoftmax_dy}")
但是当我调用 jacrev 时,我得到关于 VJP 结果结构与 softmax 输入结构不匹配的错误:
JAX version: 0.2.13
input: [1. 2. 3.] shape: (3,)
WARNING:absl:No GPU/TPU found, falling back to CPU. (Set TF_CPP_MIN_LOG_LEVEL=0 and rerun for more info.)
input: [1. 2. 3.] shape: (3,)
Saved activations:
[0.09003057 0.24472848 0.66524094] shape: (3,)
d/d_acts:
Traced<ShapedArray(float32[3])>with<BatchTrace(level=1/0)>
with val = array([[1., 0., 0.],
[0., 1., 0.],
[0., 0., 1.]], dtype=float32)
batch_dim = 0 shape: (3,)
d_acts/d_input (Jacobian of softmax):
[[ 0.08192507 -0.02203305 -0.05989202]
[-0.02203305 0.18483645 -0.1628034 ]
[-0.05989202 -0.1628034 0.22269544]] shape: (3, 3)
d/d_input:
Traced<ShapedArray(float32[3])>with<BatchTrace(level=1/0)>
with val = DeviceArray([[ 0.08192507, -0.02203305, -0.05989202],
[-0.02203305, 0.18483645, -0.1628034 ],
[-0.05989202, -0.1628034 , 0.22269544]], dtype=float32)
batch_dim = 0 shape: (3,)
Traceback (most recent call last):
File "analysis/vjp_test.py", line 30, in <module>
dsoftmax_dy = softmax_jac_fun(y)
jax._src.source_info_util.JaxStackTraceBeforeTransformation: TypeError: Custom VJP rule must produce an output with the same container (pytree) structure as the args tuple of the primal function, and in particular must produce a tuple of length equal to the number of arguments to the primal function, but got VJP output structure PyTreeDef(*) for primal input structure PyTreeDef((*,)).
但是你可以看到当我打印形状时它们都有形状 (3,) 但 JAX 似乎不同意? (实际上输入和输出是 3 x 3 矩阵,但这是因为 JAX 试图在 jacrev 中对 JVP 进行 vmap,因此一次性拉回 R(3) 的整个基础(即 3x3 单位矩阵)。
注意:如果我直接使用jax.grad或jax.vjp也会得到同样的错误。
根据 custom_vjp
docs:
The output of bwd
must be a tuple of length equal to the number of arguments of the primal function
所以后向传递中的 return 语句应该是这样的:
def ssm_bwd(acts, d_dacts):
...
return (d_dinput,)
为了理解 JAX 的反向模式自动差异,我尝试为 softmax 编写一个 custom_vjp,如下所示:
import jax
import jax.numpy as jnp
import numpy as np
@jax.custom_vjp
def stablesoftmax(x):
print(f"input: {x} shape: {x.shape}")
expc = jnp.exp(x - jnp.amax(x))
return expc / jnp.sum(expc)
def ssm_fwd(x):
s = stablesoftmax(x)
return s, s
def ssm_bwd(acts, d_dacts):
dacts_dinput = jnp.diag(acts) - jnp.outer(acts, acts) # Jacobian
d_dinput = jnp.dot(d_dacts, dacts_dinput) # Vector-Jacobian product
print(f"Saved activations:\n{acts} shape: {acts.shape}")
print(f"d/d_acts:\n{d_dacts} shape: {d_dacts.shape}")
print(f"d_acts/d_input (Jacobian of softmax):\n{dacts_dinput} shape: {dacts_dinput.shape}")
print(f"d/d_input:\n{d_dinput} shape: {d_dinput.shape}")
return d_dinput
stablesoftmax.defvjp(ssm_fwd, ssm_bwd)
print(f"JAX version: {jax.__version__}")
y = np.array([1., 2., 3.])
a = stablesoftmax(y)
softmax_jac_fun = jax.jacrev(stablesoftmax)
dsoftmax_dy = softmax_jac_fun(y)
print(f"Softmax Jacobian: {dsoftmax_dy}")
但是当我调用 jacrev 时,我得到关于 VJP 结果结构与 softmax 输入结构不匹配的错误:
JAX version: 0.2.13
input: [1. 2. 3.] shape: (3,)
WARNING:absl:No GPU/TPU found, falling back to CPU. (Set TF_CPP_MIN_LOG_LEVEL=0 and rerun for more info.)
input: [1. 2. 3.] shape: (3,)
Saved activations:
[0.09003057 0.24472848 0.66524094] shape: (3,)
d/d_acts:
Traced<ShapedArray(float32[3])>with<BatchTrace(level=1/0)>
with val = array([[1., 0., 0.],
[0., 1., 0.],
[0., 0., 1.]], dtype=float32)
batch_dim = 0 shape: (3,)
d_acts/d_input (Jacobian of softmax):
[[ 0.08192507 -0.02203305 -0.05989202]
[-0.02203305 0.18483645 -0.1628034 ]
[-0.05989202 -0.1628034 0.22269544]] shape: (3, 3)
d/d_input:
Traced<ShapedArray(float32[3])>with<BatchTrace(level=1/0)>
with val = DeviceArray([[ 0.08192507, -0.02203305, -0.05989202],
[-0.02203305, 0.18483645, -0.1628034 ],
[-0.05989202, -0.1628034 , 0.22269544]], dtype=float32)
batch_dim = 0 shape: (3,)
Traceback (most recent call last):
File "analysis/vjp_test.py", line 30, in <module>
dsoftmax_dy = softmax_jac_fun(y)
jax._src.source_info_util.JaxStackTraceBeforeTransformation: TypeError: Custom VJP rule must produce an output with the same container (pytree) structure as the args tuple of the primal function, and in particular must produce a tuple of length equal to the number of arguments to the primal function, but got VJP output structure PyTreeDef(*) for primal input structure PyTreeDef((*,)).
但是你可以看到当我打印形状时它们都有形状 (3,) 但 JAX 似乎不同意? (实际上输入和输出是 3 x 3 矩阵,但这是因为 JAX 试图在 jacrev 中对 JVP 进行 vmap,因此一次性拉回 R(3) 的整个基础(即 3x3 单位矩阵)。
注意:如果我直接使用jax.grad或jax.vjp也会得到同样的错误。
根据 custom_vjp
docs:
The output of
bwd
must be a tuple of length equal to the number of arguments of the primal function
所以后向传递中的 return 语句应该是这样的:
def ssm_bwd(acts, d_dacts):
...
return (d_dinput,)