Theano/Lasagne 具有回归功能的基本神经网络不会过拟合大小为 1 的数据集

Theano/Lasagne basic neural network with regression won't overfit dataset of size one

我在 Theano/Lasagne 中使用基本神经网络来尝试识别图像中的面部关键点,目前正在尝试让它学习单张图像(我刚刚从我的训练集)。图像为 96x96 像素,有 30 个关键点(输出)需要学习,但未能学习。这是我第一次尝试使用 Theano/Lasagne,所以我确定我错过了一些明显的东西,但我看不出我做错了什么:

import sys
import os
import time

import numpy as np
import theano
import theano.tensor as T

import lasagne
import pickle

import matplotlib.pyplot as plt

def load_data():
    with open('FKD.pickle', 'rb') as f:
        save = pickle.load(f)
        trainDataset = save['trainDataset'] # (5000, 1, 96, 96) np.ndarray of pixel values [-1,1]
        trainLabels = save['trainLabels']   # (5000, 30) np.ndarray of target values [-1,1]
        del save  # Hint to help garbage collection free up memory

        # Overtrain on dataset of 1
        trainDataset = trainDataset[:1]
        trainLabels = trainLabels[:1]

    return trainDataset, trainLabels


def build_mlp(input_var=None):

    relu = lasagne.nonlinearities.rectify
    softmax = lasagne.nonlinearities.softmax

    network = lasagne.layers.InputLayer(shape=(None, 1, imageSize, imageSize), input_var=input_var)
    network = lasagne.layers.DenseLayer(network, num_units=numLabels, nonlinearity=softmax)

    return network

def main(num_epochs=500, minibatch_size=500):

    # Load the dataset
    print "Loading data..."
    X_train, y_train = load_data()

    # Prepare Theano variables for inputs and targets
    input_var = T.tensor4('inputs')
    target_var = T.matrix('targets')

    # Create neural network model
    network = build_mlp(input_var)

    # Create a loss expression for training, the mean squared error (MSE)
    prediction = lasagne.layers.get_output(network)
    loss = lasagne.objectives.squared_error(prediction, target_var)
    loss = loss.mean()

    # Create update expressions for training
    params = lasagne.layers.get_all_params(network, trainable=True)
    updates = lasagne.updates.nesterov_momentum(loss, params, learning_rate=0.01, momentum=0.9)

    # Compile a function performing a training step on a mini-batch
    train_fn = theano.function([input_var, target_var], loss, updates=updates)

    # Collect points for final plot
    train_err_plot = []

    # Finally, launch the training loop.
    print "Starting training..."

    # We iterate over epochs:
    for epoch in range(num_epochs):
        # In each epoch, we do a full pass over the training data:
        start_time = time.time()
        train_err = train_fn(X_train, y_train)

        # Then we print the results for this epoch:
        print "Epoch %s of %s took %.3fs" % (epoch+1, num_epochs, time.time()-start_time)
        print "  training loss:\t\t%s" % train_err

        # Save accuracy to show later
        train_err_plot.append(train_err)

    # Show plot
    plt.plot(train_err_plot)
    plt.title('Graph')
    plt.xlabel('Epochs')
    plt.ylabel('Training loss')
    plt.tight_layout()
    plt.show()

imageSize = 96
numLabels = 30

if __name__ == '__main__':
    main(minibatch_size=1)

这给了我一个看起来像这样的图表:

我很漂亮这个网络应该能够将损失降到基本为零。如果对此事有任何帮助或想法,我将不胜感激:)

编辑:删除了 dropout 和隐藏层以简化问题。

原来我忘了更改输出节点函数:

lasagne.nonlinearities.softmax

至:

lasagne.nonlinearities.linear

我用作基础的代码用于分类问题(例如计算图片显示的数字),而我使用网络用于回归问题(例如试图找到图像中某些特征的位置位于)。分类问题有几个有用的输出函数,softmax就是其中之一,但是回归问题需要一个线性输出函数才能起作用。

希望这对以后的其他人有帮助:)