如何在 Tensorflow 概率分布中对结构化参数建模?

How to model structured parameters in Tensorflow Probability distributions?

我想对具有结构化参数的多元分布建模,例如:具有由低秩部分和对角线部分组成的协方差矩阵的多元正态分布。实现这一目标的推荐方法是什么? (张量流 2.8)

DIM=4
mean = tf.Variable(np.zeros(DIM), dtype=tf.float32, name='mean')
low_rank = tf.Variable(np.zeros((DIM, 2)), dtype=tf.float32, name='cov')
diagonal = tf.Variable(np.zeros(DIM), dtype=tf.float32, name='noise')

target_distribution = tfd.MultivariateNormalTriL(
     loc=mean,
     scale_tril=tf.linalg.cholesky(
         tf.linalg.matmul(low_rank, low_rank, transpose_b=True) + tf.linalg.diag(tf.math.softplus(diagonal))
     )
)

print(target_distribution.trainable_variables)

returns (<tf.Variable 'mean:0' shape=(4,) dtype=float32, numpy=array([0., 0., 0., 0.], dtype=float32)>,),即只有那些直接赋值的变量才进入跟踪变量的范围,而不是通过表达式进入的变量。

语法是什么让 low_rankdiagonal 成为我可以适应数据的可训练变量?

我知道有 tfd.MultivariateNormalDiagPlusLowRank 解决了这个具体的例子,但我仍然对推荐的结构化参数建模方法感兴趣。

当你在 tf.Variable 上 运行 任何 TF op 时(在急切模式下),变量值被读入张量并计算新值,失去与变量的任何先前关联.因此,在您的示例中,cholesky 和 ​​matmul 都在构建 Distribution 之前发生,并且它永远不会看到这些变量。

在 TFP 中,我们创建了一些实用程序来解决此类问题,特别是 tfp.util.DeferredTensor, tfp.util.TransformedVariable, and tfp.experimental.util.DeferredModule. Each of these aim to allow for lazy evaluation/construction of some thing. TransformedVariable is nice because it also handles updating of the underlying variable in pre-transform space. It's limited in the sense that it can only have a single underlying Variable -- your example suggests you'll want to have several floating around. Check out the examples in DeferredModule -- it might get you what you want. You may want to parameterize some composition of [tf.linalg.LinearOperators])https://www.tensorflow.org/api_docs/python/tf/linalg/LinearOperator) 带有一些变量或类似的东西。

这是使用 DeferredModule 重写的上述示例:https://colab.research.google.com/drive/1DRX_Jv58abfsWE6h1BIz6YiQRAzCmn8r