Binary-CrossEntropy - 适用于 Keras 但不适用于千层面？

Question

我在 Keras 和 Lasagne 上使用相同的卷积神经网络结构。现在，我只是换了一个简单的网络，看看它是否改变了什么，但它没有。

在 Keras 上它工作正常，它输出 0 到 1 之间的值，精度很高。在烤宽面条上，这些值大多不会出错。看起来输出与输入相同。

基本上：它在 keras 上输出和训练良好。但不是我的千层面版本

千层面的结构：

def structure(w=5, h=5):
    try:

        input_var = T.tensor4('inputs')
        target_var = T.bmatrix('targets')

        network = lasagne.layers.InputLayer(shape=(None, 1, h, w), input_var=input_var)

        network = lasagne.layers.Conv2DLayer(
            network, num_filters=64, filter_size=(3, 3), stride=1, pad=0,
            nonlinearity=lasagne.nonlinearities.rectify,
            W=lasagne.init.GlorotUniform())

        network = lasagne.layers.Conv2DLayer(
            network, num_filters=64, filter_size=(3, 3), stride=1, pad=0,
            nonlinearity=lasagne.nonlinearities.rectify,
            W=lasagne.init.GlorotUniform())

        network = lasagne.layers.MaxPool2DLayer(network, pool_size=(2, 2), stride=None, pad=(0, 0), ignore_border=True)

        network = lasagne.layers.DenseLayer(
            lasagne.layers.dropout(network, p=0.5),
            num_units=256,
            nonlinearity=lasagne.nonlinearities.rectify, W=lasagne.init.GlorotUniform())

        network = lasagne.layers.DenseLayer(
            lasagne.layers.dropout(network, p=0.5),
            num_units=1,
            nonlinearity=lasagne.nonlinearities.sigmoid)

        print  "...Output", lasagne.layers.get_output_shape(network)

        return network, input_var, target_var

    except Exception as inst:
        print ("Failure to Build NN !", inst.message, (type(inst)), (inst.args), (inst))

    return None

在 Keras 上：

def getModel(w,h):
    from keras.models import Sequential
    from keras.layers import Dense, Dropout, Activation, Flatten
    from keras.layers import Convolution2D, MaxPooling2D
    from keras.optimizers import SGD

    model = Sequential()

    model.add(Convolution2D(64, 3, 3, border_mode='valid', input_shape=(1, h, w)))
    model.add(Activation('relu'))
    model.add(Convolution2D(64, 3, 3))
    model.add(Activation('relu'))
    model.add(MaxPooling2D(pool_size=(2, 2)))
    model.add(Dropout(0.25))

    model.add(Convolution2D(128, 3, 3, border_mode='valid'))
    model.add(Activation('relu'))
    model.add(Convolution2D(128, 3, 3))
    model.add(Activation('relu'))
    model.add(MaxPooling2D(pool_size=(2, 2)))
    model.add(Dropout(0.25))
    #
    model.add(Flatten())
    #
    model.add(Dense(256))
    model.add(Activation('relu'))
    model.add(Dropout(0.25))

    model.add(Dense(128))
    model.add(Activation('relu'))
    model.add(Dropout(0.25))

    #
    model.add(Dense(1))
    model.add(Activation('sigmoid'))

    sgd = SGD(lr=0.001, decay=1e-6, momentum=0.9, nesterov=True)
    model.compile(loss='binary_crossentropy', optimizer='sgd')

    return model

并在 Keras 上训练..

model.fit(x, y, batch_size=512, nb_epoch=500, verbose=2, validation_split=0.2, shuffle=True, show_accuracy=True)

并在烤宽面条上进行训练和预测：

要训练：

prediction = lasagne.layers.get_output(network)

loss = lasagne.objectives.binary_crossentropy(prediction, target_var)
loss = loss.mean()

params = lasagne.layers.get_all_params(network, trainable=True)

# updates = lasagne.updates.sgd(loss, params, learning_rate=learning_rate)
updates = lasagne.updates.nesterov_momentum(loss_or_grads=loss, params=params, learning_rate=learning_rate, momentum=momentum_rho)

#
test_prediction = lasagne.layers.get_output(network, deterministic=True)

test_prediction = lasagne.layers.get_output(network, deterministic=True)
test_loss = lasagne.objectives.binary_crossentropy(test_prediction, target_var)
test_loss = test_loss.mean()

# Accuracy
test_acc = lasagne.objectives.binary_accuracy(test_prediction, target_var)
test_acc = test_acc.mean()

train_fn = theano.function([input_var, target_var], loss, updates=updates)
val_fn = theano.function([input_var, target_var], [test_loss, test_acc])

我正在使用这些迭代器，我希望这不是它的原因..也许是？

def iterate_minibatches_getOutput(self, inputs, batchsize):
    for start_idx in range(0, len(inputs) - batchsize + 1, batchsize):
        excerpt = slice(start_idx, start_idx + batchsize)
        yield inputs[excerpt]

def iterate_minibatches(self, inputs, targets, batchsize, shuffle=False):
    assert len(inputs) == len(targets)
    if shuffle:
        indices = np.arange(len(inputs))
        np.random.shuffle(indices)
    for start_idx in range(0, len(inputs) - batchsize + 1, batchsize):
        if shuffle:
            excerpt = indices[start_idx:start_idx + batchsize]
        else:
            excerpt = slice(start_idx, start_idx + batchsize)
        yield inputs[excerpt], targets[excerpt]

预测：

test_prediction = lasagne.layers.get_output(self.network, deterministic=True)
predict_fn = theano.function([self.input_var], test_prediction)


index = 0
for batch in self.iterate_minibatches_getOutput(inputs=submission_feature_x, batchsize=self.batch_size):
    inputs = batch
    y = predict_fn(inputs)
    start = index * self.batch_size
    end = (index + 1) * self.batch_size
    predictions[index * self.batch_size:self.batch_size * (index + 1)] = y
    index += 1

print "debug -->", predictions[0:10]
print "debug max ---->", np.max(predictions)
print "debug min ----->", np.min(predictions)

这打印：

debug --> [[ 0.3252553 ]
 [ 0.3252553 ]
 [ 0.3252553 ]
 [ 0.3252553 ]
 [ 0.3252553 ]
 [ 0.3252553 ]
 [ 0.3252553 ]
 [ 0.3252553 ]
 [ 0.3252553 ]
 [ 0.32534513]]
debug max ----> 1.0
debug min -----> 0.0

结果完全错误。然而，让我感到困惑的是，它在 keras 上输出正常。

此外，验证帐户永远不会改变：

Epoch 2 of 30 took 9.5846s
  Training loss:                0.22714619
  Validation loss:              0.17278196
  Validation accuracy:          95.85454545 %
Epoch 3 of 30 took 9.6437s
  Training loss:                0.22646923
  Validation loss:              0.17249792
  Validation accuracy:          95.85454545 %
Epoch 4 of 30 took 9.6464s
  Training loss:                0.22563262
  Validation loss:              0.17235395
  Validation accuracy:          95.85454545 %
Epoch 5 of 30 took 10.5069s
  Training loss:                0.22464556
  Validation loss:              0.17226825
  Validation accuracy:          95.85454545 %
...

请帮忙！我做错了什么？

这些是正在使用的形状：

x_train.shape  (102746, 1, 17, 17)
y_train.shape  (102746, 1)
x_val.shape  (11416, 1, 17, 17)
y_val.shape  (11416, 1)

Answer 1

问题是：

target_var = T.bmatrix('targets')

应该是：

target_var = T.fmatrix('targets')

另外，学习率太低了。

而在 Keras 脚本上，还有另一个错误：

sgd = SGD(lr=0.001, decay=1e-6, momentum=0.9, nesterov=True)
model.compile(loss='binary_crossentropy', optimizer='sgd')

应该是：

sgd = SGD(lr=0.001, decay=1e-6, momentum=0.9, nesterov=True)
model.compile(loss='binary_crossentropy', optimizer=sgd)

Binary-CrossEntropy - 适用于 Keras 但不适用于千层面？

Binary-CrossEntropy - Works on Keras But Not on Lasagne?

theano

keras

lasagne

cross-entropy