Tensorflow.js:op 最大值的梯度错误。输入“$a”的梯度形状为“32,200”,与输入“32,1”的形状不匹配
Tensorflow.js: Error in gradient for op maximum. The gradient of input '$a' has shape '32,200', which does not match the shape of the input '32,1'
我构建了一个非常简单的 Tensorflow 操作,一切似乎都有意义,但是当我调用 fit
函数时,模型无法反向传播梯度并出现上述错误消息:
Error in gradient for op maximum.
The gradient of input '$a' has shape '32,200',
which does not match the shape of the input '32,1'
下面是xTrain
和yTrain
的类型
xTrain
Array(3) [2000, 20, 73]
float32
yTrain
Array(2) [2000, 200]
float32
以下是模型的预期输入和输出:
model.input
Array(3) [null, 20, 73]
float32
model.outputs[0]
Array(2) [null, 200]
float32
[编辑] 我应该注意到我的问题只在我尝试使用
时发生
loss: 'cosineProximity'
这是我的代码:
console.log("starting compute_and_save_model");
const model = tf.sequential();
model.add(tf.layers.simpleRNN({
units: length_of_embedding,//amount_of_rnn_units,
recurrentInitializer: 'glorotNormal',
inputShape: [max_len, recogized_letters.length],
return_sequences: false,
}));
console.log(model.input.shape);
console.log(model.input.dtype);
console.log(model.outputs[0].shape);
console.log(model.outputs[0].dtype);
console.log(model.batchInputShape);
model.compile({
loss: 'cosineProximity',
optimizer: 'adam',
metrics: ['acc']
});
console.log("starting compute_and_save_model (fit)")
await model.fit(xTrain, yTrain, {
epochs: 2,
batchSize: 32,
validationSplit: 0.2,
callbacks: {
onBatchBegin(b) {
console.log("starting compute_and_save_model (fit:"+b+")");
}
}
});
Runnable from https://stackblitz.com/edit/js-ddlwge
有没有人知道这里可能出了什么问题?
EDIT: I tried to create my own cosineProximity
implementation and get the same error. For reference here was my implementation of the cosineProximity
:
const cosine = tf.layers.dot({axes: -1,normalize:true})
loss: function(a,b) {
return tf.neg(tf.mean(cosine.apply([a,b])));
},
好吧,我在这上面花了一些时间,看起来这是 Tensforflow.js 实现中的一个错误。
如果您遇到同样的问题,您可以通过自己应用以下补丁来修复它(我相信 tfjs-layers 维护者最终会合并这个 pull request,所以希望您不会再遇到这个问题未来)。
| export function l2Normalize(x: Tensor, axis?: number): Tensor {
| return tidy(() => {
| const squareSum = tfc.sum(K.square(x), axis, true);
- const epsilonTensor = tfc.mul(scalar(epsilon()), tfc.onesLike(x));
+ const epsilonTensor = tfc.mul(scalar(epsilon()), tfc.onesLike(squareSum));
| const norm = tfc.sqrt(tfc.maximum(squareSum, epsilonTensor));
| return tfc.div(x, norm);
| });
| }
我构建了一个非常简单的 Tensorflow 操作,一切似乎都有意义,但是当我调用 fit
函数时,模型无法反向传播梯度并出现上述错误消息:
Error in gradient for op maximum.
The gradient of input '$a' has shape '32,200',
which does not match the shape of the input '32,1'
下面是xTrain
和yTrain
xTrain
Array(3) [2000, 20, 73]
float32
yTrain
Array(2) [2000, 200]
float32
以下是模型的预期输入和输出:
model.input
Array(3) [null, 20, 73]
float32
model.outputs[0]
Array(2) [null, 200]
float32
[编辑] 我应该注意到我的问题只在我尝试使用
时发生loss: 'cosineProximity'
这是我的代码:
console.log("starting compute_and_save_model");
const model = tf.sequential();
model.add(tf.layers.simpleRNN({
units: length_of_embedding,//amount_of_rnn_units,
recurrentInitializer: 'glorotNormal',
inputShape: [max_len, recogized_letters.length],
return_sequences: false,
}));
console.log(model.input.shape);
console.log(model.input.dtype);
console.log(model.outputs[0].shape);
console.log(model.outputs[0].dtype);
console.log(model.batchInputShape);
model.compile({
loss: 'cosineProximity',
optimizer: 'adam',
metrics: ['acc']
});
console.log("starting compute_and_save_model (fit)")
await model.fit(xTrain, yTrain, {
epochs: 2,
batchSize: 32,
validationSplit: 0.2,
callbacks: {
onBatchBegin(b) {
console.log("starting compute_and_save_model (fit:"+b+")");
}
}
});
Runnable from https://stackblitz.com/edit/js-ddlwge
有没有人知道这里可能出了什么问题?
EDIT: I tried to create my own
cosineProximity
implementation and get the same error. For reference here was my implementation of thecosineProximity
:const cosine = tf.layers.dot({axes: -1,normalize:true})
loss: function(a,b) { return tf.neg(tf.mean(cosine.apply([a,b]))); },
好吧,我在这上面花了一些时间,看起来这是 Tensforflow.js 实现中的一个错误。
如果您遇到同样的问题,您可以通过自己应用以下补丁来修复它(我相信 tfjs-layers 维护者最终会合并这个 pull request,所以希望您不会再遇到这个问题未来)。
| export function l2Normalize(x: Tensor, axis?: number): Tensor {
| return tidy(() => {
| const squareSum = tfc.sum(K.square(x), axis, true);
- const epsilonTensor = tfc.mul(scalar(epsilon()), tfc.onesLike(x));
+ const epsilonTensor = tfc.mul(scalar(epsilon()), tfc.onesLike(squareSum));
| const norm = tfc.sqrt(tfc.maximum(squareSum, epsilonTensor));
| return tfc.div(x, norm);
| });
| }