Tensorflow.js:op 最大值的梯度错误。输入“$a”的梯度形状为“32,200”,与输入“32,1”的形状不匹配

Tensorflow.js: Error in gradient for op maximum. The gradient of input '$a' has shape '32,200', which does not match the shape of the input '32,1'

我构建了一个非常简单的 Tensorflow 操作,一切似乎都有意义,但是当我调用 fit 函数时,模型无法反向传播梯度并出现上述错误消息:

Error in gradient for op maximum. 

The gradient of input '$a' has shape '32,200', 
which does not match the shape of the input '32,1'

下面是xTrainyTrain

的类型
xTrain
  Array(3) [2000, 20, 73]
  float32
yTrain
  Array(2) [2000, 200]
  float32

以下是模型的预期输入和输出:

model.input
  Array(3) [null, 20, 73]
  float32
model.outputs[0]
  Array(2) [null, 200]
  float32

[编辑] 我应该注意到我的问题只在我尝试使用

时发生
loss: 'cosineProximity'

这是我的代码:

console.log("starting compute_and_save_model");

const model = tf.sequential();
model.add(tf.layers.simpleRNN({
    units: length_of_embedding,//amount_of_rnn_units,
    recurrentInitializer: 'glorotNormal',
    inputShape: [max_len, recogized_letters.length],
    return_sequences: false,
}));

console.log(model.input.shape);
console.log(model.input.dtype);
console.log(model.outputs[0].shape);
console.log(model.outputs[0].dtype);
console.log(model.batchInputShape);

model.compile({
    loss: 'cosineProximity',
    optimizer: 'adam',
    metrics: ['acc']
});

console.log("starting compute_and_save_model (fit)")

await model.fit(xTrain, yTrain, {
    epochs: 2,
    batchSize: 32,
    validationSplit: 0.2,
    callbacks: {
        onBatchBegin(b) {
            console.log("starting compute_and_save_model (fit:"+b+")");
        }
    }
});

Runnable from https://stackblitz.com/edit/js-ddlwge

有没有人知道这里可能出了什么问题?

EDIT: I tried to create my own cosineProximity implementation and get the same error. For reference here was my implementation of the cosineProximity:

const cosine = tf.layers.dot({axes: -1,normalize:true})
loss: function(a,b) {
    return tf.neg(tf.mean(cosine.apply([a,b])));
},

好吧,我在这上面花了一些时间,看起来这是 Tensforflow.js 实现中的一个错误。

如果您遇到同样的问题,您可以通过自己应用以下补丁来修复它(我相信 tfjs-layers 维护者最终会合并这个 pull request,所以希望您不会再遇到这个问题未来)。

https://github.com/tensorflow/tfjs-layers/pull/499

| export function l2Normalize(x: Tensor, axis?: number): Tensor {
|   return tidy(() => {
|     const squareSum = tfc.sum(K.square(x), axis, true);
-     const epsilonTensor = tfc.mul(scalar(epsilon()), tfc.onesLike(x));
+     const epsilonTensor = tfc.mul(scalar(epsilon()), tfc.onesLike(squareSum));
|     const norm = tfc.sqrt(tfc.maximum(squareSum, epsilonTensor));
|     return tfc.div(x, norm);
|   });
| }