用于手写识别的神经网络?

neural network for handwritten recognition?

我一直在关注Andrew Ng关于机器学习的课程,我目前对手写识别工具的实现有一些疑问。

-首先他说他使用了 MNIST 数据集的一个子集,其中包含 5000 个训练示例,每个训练示例都是 20x20 灰度格式的图像。他说,我们有一个长度为 400 个元素的向量,即前面描述的数据的 "unrolled"。这是否意味着训练集具有类似以下格式的内容?

Training example 1 v[1,2,...,400]
Training example 2 v[1,2,...,400]
...
Training example 5000 v[1,2,...,400]

对于编码部分作者在Matlab中给出了如下完整代码:

%% Machine Learning Online Class - Exercise 3 | Part 2: Neural Networks

%  Instructions
%  ------------
% 
%  This file contains code that helps you get started on the
%  linear exercise. You will need to complete the following functions 
%  in this exericse:
%
%     lrCostFunction.m (logistic regression cost function)
%     oneVsAll.m
%     predictOneVsAll.m
%     predict.m
%
%  For this exercise, you will not need to change any code in this file,
%  or any other files other than those mentioned above.
%

%% Initialization
clear ; close all; clc

%% Setup the parameters you will use for this exercise
input_layer_size  = 400;  % 20x20 Input Images of Digits
hidden_layer_size = 25;   % 25 hidden units
num_labels = 10;          % 10 labels, from 1 to 10   
                          % (note that we have mapped "0" to label 10)

%% =========== Part 1: Loading and Visualizing Data =============
%  We start the exercise by first loading and visualizing the dataset. 
%  You will be working with a dataset that contains handwritten digits.
%

% Load Training Data
fprintf('Loading and Visualizing Data ...\n')

load('ex3data1.mat');
m = size(X, 1);

% Randomly select 100 data points to display
sel = randperm(size(X, 1));
sel = sel(1:100);

displayData(X(sel, :));

fprintf('Program paused. Press enter to continue.\n');
pause;

%% ================ Part 2: Loading Pameters ================
% In this part of the exercise, we load some pre-initialized 
% neural network parameters.

fprintf('\nLoading Saved Neural Network Parameters ...\n')

% Load the weights into variables Theta1 and Theta2
load('ex3weights.mat');

%% ================= Part 3: Implement Predict =================
%  After training the neural network, we would like to use it to predict
%  the labels. You will now implement the "predict" function to use the
%  neural network to predict the labels of the training set. This lets
%  you compute the training set accuracy.

pred = predict(Theta1, Theta2, X);

fprintf('\nTraining Set Accuracy: %f\n', mean(double(pred == y)) * 100);

fprintf('Program paused. Press enter to continue.\n');
pause;

%  To give you an idea of the network's output, you can also run
%  through the examples one at the a time to see what it is predicting.

%  Randomly permute examples
rp = randperm(m);

for i = 1:m
    % Display 
    fprintf('\nDisplaying Example Image\n');
    displayData(X(rp(i), :));

    pred = predict(Theta1, Theta2, X(rp(i),:));
    fprintf('\nNeural Network Prediction: %d (digit %d)\n', pred, mod(pred, 10));

    % Pause
    fprintf('Program paused. Press enter to continue.\n');
    pause;
end

而predict函数应该是同学们完成的,我做了如下:

function p = predict(Theta1, Theta2, X)
%PREDICT Predict the label of an input given a trained neural network
%   p = PREDICT(Theta1, Theta2, X) outputs the predicted label of X given the
%   trained weights of a neural network (Theta1, Theta2)

% Useful values
m = size(X, 1);
num_labels = size(Theta2, 1);

% You need to return the following variables correctly 
p = zeros(size(X, 1), 1);
X = [ones(m , 1) X];
% ====================== YOUR CODE HERE ======================
% Instructions: Complete the following code to make predictions using
%               your learned neural network. You should set p to a 
%               vector containing labels between 1 to num_labels.
%
% Hint: The max function might come in useful. In particular, the max
%       function can also return the index of the max element, for more
%       information see 'help max'. If your examples are in rows, then, you
%       can use max(A, [], 2) to obtain the max for each row.
%

a1 = X;
a2 = sigmoid(a1*Theta1');
a2 = [ones(m , 1) a2];
a3 = sigmoid(a2*Theta2');
[M , p] = max(a3 , [] , 2);

即使认为它可以运行,我也不完全了解它的实际工作原理(我只是按照作者网站上的分步说明进行操作)。我有以下疑问:

我们将不胜感激任何帮助。 谢谢

我前段时间上过同样的课程。

X是输入数据。因此 X 是由 5000 个向量组成的矩阵,每个向量有 400 个元素。没有训练集,因为网络是预训练的。

通常训练 theta 1 和 2 的值。如何做到这一点是接下来几节课的主题。 (反向传播算法)

我不完全确定,为什么他使用 25 个神经元作为隐藏层。然而我的猜测是,这个数量的神经元简单地工作,而不会使训练步骤永远花费。

让我们把你的问题分成几个部分:

First he says that he uses a subset of the MNIST dataset, which contaings 5000 training examples and each training example is an image in a 20x20 gray scale format. With that he says that we have a vector of 400 elements of length that is the "unrolled" of the data previously described. Does it mean that the train set has something like the following format? (...)

你走在正确的轨道上。每个训练示例都是一个 20x20 的图像。课程中介绍的最简单的神经网络模型将每个图像视为一个简单的 1x400 向量("unrolled" 正是这种转换)。数据集存储在矩阵中,因为这样您可以利用 Octave/Matlab 使用的高效线性代数库更快地执行计算。您不一定需要将所有训练示例存储为 5000x400 矩阵,但这样您的代码将 运行 更快。

The author considers that X(input) is an array of 5000 x 400 elements, or it has 400 neurons as input, with 10 neurons as output and a hidden layer. Does it mean this 5000 x 400 values are the training set?

"input layer" 只不过是输入图像。您可以将其视为已经计算出输出值的神经元,或者将其视为来自网络外部的值(想想您的视网膜。它就像您视觉系统的输入层)。因此该网络有 400 个输入单元("unrolled" 20x20 图像)。但是,当然,您的训练集不包含单个图像,因此您将所有 5000 张图像放在一个 5000x400 矩阵中以形成训练集。

The author gives us the values of theta 1 and theta 2, which I believe serve as weights for the calculations on the inner layer, but how does values are obtained?

这些 theta 值是使用称为反向传播的算法找到的。如果您不必在课程中实施它,请耐心等待。它可能很快就会在练习中!顺便说一句,是的,它们是权重。

Why does he uses 25 neurons of hidden layer and not 24 or 30?

他可能选择了一个既不会 运行 太慢又不会太差的性能的任意值。您可能可以为这个超参数找到更好的值。但是如果你增加太多,训练过程可能会花费更长的时间。另外由于你只是使用了空洞训练集的一小部分(原始 MNIST 有 60000 个训练示例和 28x28 图像),你需要使用 "small" 个隐藏单元来防止过度拟合。如果您使用太多单位,您的神经元将 "learn by heart" 训练示例,并且将无法泛化到新的看不见的数据。找到超参数,例如隐藏单元的数量,是一种您将通过经验掌握的艺术(也许还有贝叶斯优化和更高级的方法,但那是另一回事 xD)。