(matlab) 具有 relu 和 softmax 的 MLP 不适用于小批量 SGD 并在 MNIST 数据集上产生类似的预测
(matlab) MLP with relu and softmax not working with mini-batch SGD and produces similar predictions on MNIST dataset
我在 MNIST 数据集上实现了一个带有 1 个隐藏层的多层感知器。隐藏层的激活函数是 leaky(0.01) ReLu,输出层有一个 softmax 激活函数。学习方法是mini-batch SGD。网络结构为784*30*10。问题是我发现网络对每个输入样本所做的预测非常相似。这意味着模型总是想认为图像是某个特定的数字。感谢@Lemm Ras 指出之前 data_shuffle 函数中的标签数据不匹配问题,现已修复。但经过一些批量训练后,我发现预测仍然有些相似: 这令人困惑。
另一个问题是更新值相对于原始权重太小,在MLP代码中,我添加了变量'cc'和'dd'来记录它们的weight_update和体重,
cc=W_OUTPUT_Update./W_OUTPUT;
dd=W_MLP_Update./W_MLP;
在调试过程中,cc的量级是10^-4(0.0001),dd也是10^-4。这可能是 the accuracy doesn't seems improved a lot.
的原因
经过几天调试。我不知道为什么会这样,也不知道如何解决,这让我困了一个星期。有谁可以帮助我吗?
截图是A2经过softmax函数后的值。
[dimension, images, labels, labels_matrix, train_amount, test_labels_matrix, test_images, test_labels, test_amount] = load_mnist_data(); %initialize str
images=images(:,1:10000); % for debugging, get part of whole data set
labels=labels(1:10000,1);
labels_matrix=labels_matrix(:,1:10000);
test_images=test_images(:,1:500);
test_labels=test_labels(1:500,1);
train_amount=10000;
test_amount=500;
% initialize the structure
[ W_MAD, W_MLP, W_OUTPUT] = initialize_structure(dimension, train_amount, test_amount);
epoch=100;
correct_rate=zeros(1,epoch); %record testing accuracy
corr=zeros(1,epoch); %record training accuracy
lr=0.2;
lamda=0;
batch_size=50;
for i=1:epoch
sprintf('MLP in iteration %d over %d', i, epoch)
%shuffle data
[labels_shuffled labels_matrix_shuffled images_shuffled]=shuffle_data(labels, labels_matrix,images);
[ cor, W_MLP, W_OUTPUT ] = train_mlp_relu(lr, leaky, lamda, momentum_gamma, batch_size,W_MLP, W_OUTPUT, W_MAD, power, images_shuffled, train_amount, labels_shuffled, labels_matrix_shuffled);
corr(i)=cor/train_amount;
% test
correct_rate(i) = structure_test( W_MAD, W_MLP, W_OUTPUT, test_images, test_labels, test_amount );
end
% plot results
plot(1:epoch,correct_rate);
这里是训练MLP函数,请忽略L2正则化参数lamda,目前设置为0
%MLP with batch size batch_size
cor=0;
%leaky=(1/batch_size);
leaky=0.001;
for i=1:train_amount/batch_size
batch_images=images(:,batch_size*(i-1)+1:batch_size*i);
batch_labels=labels_matrix(:,batch_size*(i-1)+1:batch_size*i);
%from MAD to MLP
V1=W_MLP'*batch_images;
V1(1,:)=1; %set bias unit as 1
V1_dirivative=ones(size(V1));
V1_dirivative(find(V1<0))=leaky;
A1=relu(V1,leaky); % A stands for activation
V2=W_OUTPUT'* A1;
A2=softmax(V2);
%write these scope control codes into functions.
%train error
[val idx]=max(A2);
idx=idx-1; %because index(idx) for matrix vaires from 1 to 10 while label varies from 0 to 9.
res=labels(batch_size*(i-1)+1:batch_size*i)-idx';
cor=cor+sum(res(:)==0);
%softmax loss, due to relu applied nodes that has
%contribution to activate neurons has gradient 1; while <0 nodes
%has no contribution
delta_softmax=-(1/batch_size)*(batch_labels-A2);
delta_output=W_OUTPUT*delta_softmax.*V1_dirivative;
%update
W_OUTPUT_Update=lr*(1/batch_size)*A1*delta_softmax'+lamda*W_OUTPUT;
cc=W_OUTPUT_Update./W_OUTPUT;
W_MLP_Update=lr*(1/batch_size)*batch_images*delta_output'+lamda*W_MLP;
dd=W_MLP_Update./W_MLP;
k=mean(A2,2);
W_OUTPUT=W_OUTPUT-W_OUTPUT_Update;
W_MLP=W_MLP-W_MLP_Update;
end
end
这里是softmax函数:
function [ val ] = softmax( val )
val=exp(val);
val=val./repmat(sum(val),10,1);
end
labels_matrix 是 A2 的目标输出矩阵,创建为:
labels_matrix=full(sparse(labels+1,1:train_amount,1));
test_labels_matrix=full(sparse(test_labels+1,1:test_amount,1));
和Relu:
function [ val ] = relu( val,leaky )
val(find(val<0))=leaky*val(find(val<0));
end
数据随机播放
%this version is wrong, due to it only shuffles label and data without doing the same shuffling on the 'labels_matrix' which is used to calculate MLP's delta in output layer. It destroyed the link between data and label.
% function [ label, data ] = shuffle_data( label, data )
% [row column]=size(data);
% array=randperm(column);
% data=data(:,array);
% label=label(array);
% %if shuffle respect to row then use the code below
% %data=data(randperm(row),:);
% end
function [ label, label_matrix, data ] = shuffle_data( label, label_matrix, data )
[row column]=size(data);
array=randperm(column);
data=data(:,array);
label=label(array);
label_matrix=label_matrix(:, array);
%if shuffle respect to row then use the code below
%data=data(randperm(row),:);
end
数据加载:
function [ dimension, images, labels, labels_matrix, train_amount, test_labels_matrix, test_images, test_labels, test_amount] = load_mnist_data()
%%load training and testing data, labels
data_location='C:\Users\yz39g15\Documents\MATLAB\common\mnist test\for the report/modify/train-images.idx3-ubyte';
label_location='C:\Users\yz39g15\Documents\MATLAB\common\mnist test\for the report/modify/train-labels.idx1-ubyte';
test_data_location='C:\Users\yz39g15\Documents\MATLAB\common\mnist test\for the report/modify/t10k-images.idx3-ubyte';
test_label_location='C:\Users\yz39g15\Documents\MATLAB\common\mnist test\for the report/modify/t10k-labels.idx1-ubyte';
images = loadMNISTImages(data_location);
labels = loadMNISTLabels(label_location);
test_images=loadMNISTImages(test_data_location);
test_labels=loadMNISTLabels(test_label_location);
%%data centralization
[dimension train_amount]=size(images);
[dimension test_amount]=size(test_images);
%%complete normalization
%%transform labels from index to matrix in order to apply square loss function in output layer
labels_matrix=full(sparse(labels+1,1:train_amount,1));
test_labels_matrix=full(sparse(test_labels+1,1:test_amount,1));
end
当您打乱图像时,关联数据标签丢失了。由于此关联必须存在,因此您需要对数据和标签执行相同的改组。
例如,您可以创建一个外部混洗索引列表:shuffled=randperm(N)
,图像数量为 N,然后将创建的列表或元素传递给训练方法images
和 label
由随机列表处理。
我在 MNIST 数据集上实现了一个带有 1 个隐藏层的多层感知器。隐藏层的激活函数是 leaky(0.01) ReLu,输出层有一个 softmax 激活函数。学习方法是mini-batch SGD。网络结构为784*30*10。问题是我发现网络对每个输入样本所做的预测非常相似。这意味着模型总是想认为图像是某个特定的数字。感谢@Lemm Ras 指出之前 data_shuffle 函数中的标签数据不匹配问题,现已修复。但经过一些批量训练后,我发现预测仍然有些相似:
另一个问题是更新值相对于原始权重太小,在MLP代码中,我添加了变量'cc'和'dd'来记录它们的weight_update和体重,
cc=W_OUTPUT_Update./W_OUTPUT;
dd=W_MLP_Update./W_MLP;
在调试过程中,cc的量级是10^-4(0.0001),dd也是10^-4。这可能是 the accuracy doesn't seems improved a lot.
的原因经过几天调试。我不知道为什么会这样,也不知道如何解决,这让我困了一个星期。有谁可以帮助我吗? 截图是A2经过softmax函数后的值。
[dimension, images, labels, labels_matrix, train_amount, test_labels_matrix, test_images, test_labels, test_amount] = load_mnist_data(); %initialize str
images=images(:,1:10000); % for debugging, get part of whole data set
labels=labels(1:10000,1);
labels_matrix=labels_matrix(:,1:10000);
test_images=test_images(:,1:500);
test_labels=test_labels(1:500,1);
train_amount=10000;
test_amount=500;
% initialize the structure
[ W_MAD, W_MLP, W_OUTPUT] = initialize_structure(dimension, train_amount, test_amount);
epoch=100;
correct_rate=zeros(1,epoch); %record testing accuracy
corr=zeros(1,epoch); %record training accuracy
lr=0.2;
lamda=0;
batch_size=50;
for i=1:epoch
sprintf('MLP in iteration %d over %d', i, epoch)
%shuffle data
[labels_shuffled labels_matrix_shuffled images_shuffled]=shuffle_data(labels, labels_matrix,images);
[ cor, W_MLP, W_OUTPUT ] = train_mlp_relu(lr, leaky, lamda, momentum_gamma, batch_size,W_MLP, W_OUTPUT, W_MAD, power, images_shuffled, train_amount, labels_shuffled, labels_matrix_shuffled);
corr(i)=cor/train_amount;
% test
correct_rate(i) = structure_test( W_MAD, W_MLP, W_OUTPUT, test_images, test_labels, test_amount );
end
% plot results
plot(1:epoch,correct_rate);
这里是训练MLP函数,请忽略L2正则化参数lamda,目前设置为0
%MLP with batch size batch_size
cor=0;
%leaky=(1/batch_size);
leaky=0.001;
for i=1:train_amount/batch_size
batch_images=images(:,batch_size*(i-1)+1:batch_size*i);
batch_labels=labels_matrix(:,batch_size*(i-1)+1:batch_size*i);
%from MAD to MLP
V1=W_MLP'*batch_images;
V1(1,:)=1; %set bias unit as 1
V1_dirivative=ones(size(V1));
V1_dirivative(find(V1<0))=leaky;
A1=relu(V1,leaky); % A stands for activation
V2=W_OUTPUT'* A1;
A2=softmax(V2);
%write these scope control codes into functions.
%train error
[val idx]=max(A2);
idx=idx-1; %because index(idx) for matrix vaires from 1 to 10 while label varies from 0 to 9.
res=labels(batch_size*(i-1)+1:batch_size*i)-idx';
cor=cor+sum(res(:)==0);
%softmax loss, due to relu applied nodes that has
%contribution to activate neurons has gradient 1; while <0 nodes
%has no contribution
delta_softmax=-(1/batch_size)*(batch_labels-A2);
delta_output=W_OUTPUT*delta_softmax.*V1_dirivative;
%update
W_OUTPUT_Update=lr*(1/batch_size)*A1*delta_softmax'+lamda*W_OUTPUT;
cc=W_OUTPUT_Update./W_OUTPUT;
W_MLP_Update=lr*(1/batch_size)*batch_images*delta_output'+lamda*W_MLP;
dd=W_MLP_Update./W_MLP;
k=mean(A2,2);
W_OUTPUT=W_OUTPUT-W_OUTPUT_Update;
W_MLP=W_MLP-W_MLP_Update;
end
end
这里是softmax函数:
function [ val ] = softmax( val )
val=exp(val);
val=val./repmat(sum(val),10,1);
end
labels_matrix 是 A2 的目标输出矩阵,创建为:
labels_matrix=full(sparse(labels+1,1:train_amount,1));
test_labels_matrix=full(sparse(test_labels+1,1:test_amount,1));
和Relu:
function [ val ] = relu( val,leaky )
val(find(val<0))=leaky*val(find(val<0));
end
数据随机播放
%this version is wrong, due to it only shuffles label and data without doing the same shuffling on the 'labels_matrix' which is used to calculate MLP's delta in output layer. It destroyed the link between data and label.
% function [ label, data ] = shuffle_data( label, data )
% [row column]=size(data);
% array=randperm(column);
% data=data(:,array);
% label=label(array);
% %if shuffle respect to row then use the code below
% %data=data(randperm(row),:);
% end
function [ label, label_matrix, data ] = shuffle_data( label, label_matrix, data )
[row column]=size(data);
array=randperm(column);
data=data(:,array);
label=label(array);
label_matrix=label_matrix(:, array);
%if shuffle respect to row then use the code below
%data=data(randperm(row),:);
end
数据加载:
function [ dimension, images, labels, labels_matrix, train_amount, test_labels_matrix, test_images, test_labels, test_amount] = load_mnist_data()
%%load training and testing data, labels
data_location='C:\Users\yz39g15\Documents\MATLAB\common\mnist test\for the report/modify/train-images.idx3-ubyte';
label_location='C:\Users\yz39g15\Documents\MATLAB\common\mnist test\for the report/modify/train-labels.idx1-ubyte';
test_data_location='C:\Users\yz39g15\Documents\MATLAB\common\mnist test\for the report/modify/t10k-images.idx3-ubyte';
test_label_location='C:\Users\yz39g15\Documents\MATLAB\common\mnist test\for the report/modify/t10k-labels.idx1-ubyte';
images = loadMNISTImages(data_location);
labels = loadMNISTLabels(label_location);
test_images=loadMNISTImages(test_data_location);
test_labels=loadMNISTLabels(test_label_location);
%%data centralization
[dimension train_amount]=size(images);
[dimension test_amount]=size(test_images);
%%complete normalization
%%transform labels from index to matrix in order to apply square loss function in output layer
labels_matrix=full(sparse(labels+1,1:train_amount,1));
test_labels_matrix=full(sparse(test_labels+1,1:test_amount,1));
end
当您打乱图像时,关联数据标签丢失了。由于此关联必须存在,因此您需要对数据和标签执行相同的改组。
例如,您可以创建一个外部混洗索引列表:shuffled=randperm(N)
,图像数量为 N,然后将创建的列表或元素传递给训练方法images
和 label
由随机列表处理。